Reinforcement learning (RL) is a popular approach to training autonomous agents who can learn to perform complex tasks by interacting with their environment. RL allows them to learn the best action in different conditions and adapt to their environment through a reward system.
A big challenge in RL is how to efficiently explore the vast state space of many real world problems. This challenge arises because in RL, agents learn by interacting with their environment through exploration. Think of an agent trying to play Minecraft. If you’ve heard of this before, you know how complicated the Minecraft crafting tree looks. You have hundreds of items you can craft, and you may need to craft one to craft another, etc. So it’s a really complex environment.
Since the environment can have a large number of possible states and actions, it can be difficult for the agent to find the optimal policy just by random scanning. The agent must balance exploiting the current best policy and exploring new parts of state space to potentially find a better policy. Finding efficient exploration methods that can balance exploration and exploitation is an active area of research in RL.
It is known that practical decision-making systems need to use prior knowledge about a task efficiently. By having prior information about the task itself, the agent can better tailor its policy and avoid getting locked into sub-optimal policies. However, most reinforcement learning methods currently train without any prior training or external knowledge.
But why is that the case? In recent years, there has been a growing interest in the use of extensive language models (LLM) to assist RL agents in exploration by providing external knowledge. This approach has shown promise, but there are still many challenges to overcome, such as grounding LLM knowledge in the environment and dealing with the accuracy of LLM results.
So should we stop using LLM to help RL agents? If not, how can we fix those issues and then use them again to guide RL agents? The answer has a name, and it is DECK.
DECK it is minecraft trained as crafting a specific item in minecraft can be a challenging task if one does not have expert knowledge of the game. This has been proven by studies that have shown that achieving a goal in Minecraft can be made easier by using dense rewards or expert demos. As a result, creating items in Minecraft has become a persistent challenge in the AI arena.
DECK uses a few-shot hints technique in a large language model (LLM) to generate an abstract world model (AWM) for the child goals. You use the LLM to hypothesize an AWM, which means that dreams about the task and the steps to solve it. Then she wakes up and learns a modular policy of subgoals that he generates during his sleep. Since this is done in the real environment, DECKARD can verify the hypothetical AWM. The AWM is corrected during the activation phase and the discovered nodes are marked as verified for future use.
Experiments show us that the LLM guide is essential for exploration in DECKARD, with a version of the agent without an LLM guide taking twice as long to craft most items during open exploration. When exploring a specific task, DECK improves sample efficiency by orders of magnitude compared to comparable agents, demonstrating the potential to robustly apply LLM to RL.
review the Research work, Code, and Project. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She is currently pursuing a PhD. She graduated from the University of Klagenfurt, Austria, and working as a researcher in the ATHENA project. Her research interests include deep learning, computer vision, and multimedia networks.