Large pre-trained models are showing increasingly better performance on reasoning and planning tasks in different modalities, opening up the possibility of leveraging them for complex sequential decision-making problems. In this paper, we investigate the capabilities of large language models (LLMs) for reinforcement learning (RL) in a variety of interactive domains. We evaluate its ability to produce decision-making policies, either directly, by generating actions, or indirectly, by first generating reward models to train an agent with RL. Our results show that even without task-specific tuning, LLMs excel at reward modeling. In particular, crafting rewards through artificial intelligence (ai) feedback produces the most applicable approach and can improve performance by improving credit allocation and exploration. Finally, in environments with unknown dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities while mitigating catastrophic forgetting, further expanding their utility in sequential decision-making tasks.