We show that large language models (LLMs) can be tailored to be generalizable policies for embodied visual tasks. Our approach, called Large Language Model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and egocentric visual observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrases of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1000 unseen tasks it achieves a success rate of 42%, 1.7 times the success rate of other common learned baselines or zero-shot applications of LLM. Finally, to help the community study language-conditioned, multitasking, and embodied ai problems, we launched a new benchmark, Language Reordering, which consists of 150,000 training tasks and 1,000 tests for language-conditioned reordering.