Dream first, learn later: DECKARD is an AI approach using LLM to train reinforcement learning (RL) agents

Reinforcement learning (RL) is a popular approach to training autonomous agents who can learn to perform complex tasks by interacting with their environment. RL allows them to learn the best action in different conditions and adapt to their environment through a reward system.

A big challenge in RL is how to efficiently explore the vast state space of many real world problems. This challenge arises because in RL, agents learn by interacting with their environment through exploration. Think of an agent trying to play Minecraft. If you’ve heard of this before, you know how complicated the Minecraft crafting tree looks. You have hundreds of items you can craft, and you may need to craft one to craft another, etc. So it’s a really complex environment.

Since the environment can have a large number of possible states and actions, it can be difficult for the agent to find the optimal policy just by random scanning. The agent must balance exploiting the current best policy and exploring new parts of state space to potentially find a better policy. Finding efficient exploration methods that can balance exploration and exploitation is an active area of research in RL.

JOIN the fastest ML subreddit community

It is known that practical decision-making systems need to use prior knowledge about a task efficiently. By having prior information about the task itself, the agent can better tailor its policy and avoid getting locked into sub-optimal policies. However, most reinforcement learning methods currently train without any prior training or external knowledge.

But why is that the case? In recent years, there has been a growing interest in the use of extensive language models (LLM) to assist RL agents in exploration by providing external knowledge. This approach has shown promise, but there are still many challenges to overcome, such as grounding LLM knowledge in the environment and dealing with the accuracy of LLM results.

So should we stop using LLM to help RL agents? If not, how can we fix those issues and then use them again to guide RL agents? The answer has a name, and it is DECK.

DECK it is minecraft trained as crafting a specific item in minecraft can be a challenging task if one does not have expert knowledge of the game. This has been proven by studies that have shown that achieving a goal in Minecraft can be made easier by using dense rewards or expert demos. As a result, creating items in Minecraft has become a persistent challenge in the AI arena.

DECK uses a few-shot hints technique in a large language model (LLM) to generate an abstract world model (AWM) for the child goals. You use the LLM to hypothesize an AWM, which means that dreams about the task and the steps to solve it. Then she wakes up and learns a modular policy of subgoals that he generates during his sleep. Since this is done in the real environment, DECKARD can verify the hypothetical AWM. The AWM is corrected during the activation phase and the discovered nodes are marked as verified for future use.

Experiments show us that the LLM guide is essential for exploration in DECKARD, with a version of the agent without an LLM guide taking twice as long to craft most items during open exploration. When exploring a specific task, DECK improves sample efficiency by orders of magnitude compared to comparable agents, demonstrating the potential to robustly apply LLM to RL.

review the Research work, Code, and Project. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

Check out 100 AI tools at AI Tools Club

Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She is currently pursuing a PhD. She graduated from the University of Klagenfurt, Austria, and working as a researcher in the ATHENA project. Her research interests include deep learning, computer vision, and multimedia networks.

Dream first, learn later: DECKARD is an AI approach using LLM to train reinforcement learning (RL) agents

Technical Terrence Team

Ethereum Transaction Fees Hit May 2022 Highs, What Does This Mean For ETH?

Leave a Reply Cancel reply

Recommended.

Solana-based meme coins to surge 8x against their Ethereum counterparts in 2024

TechScape: Seven Major AI Acronyms Explained | Technology

Digital signage: where to start

Blockdaemon among the founders of the BSN Spartan Network governing body

MicroStrategy Buys 6,455 BTC, Repays $205 Million Loan at 22% Discount

Categories

Important Links

Dream first, learn later: DECKARD is an AI approach using LLM to train reinforcement learning (RL) agents

Related

Technical Terrence Team

Ethereum Transaction Fees Hit May 2022 Highs, What Does This Mean For ETH?

Leave a Reply Cancel reply

Recommended.

Solana-based meme coins to surge 8x against their Ethereum counterparts in 2024

TechScape: Seven Major AI Acronyms Explained | Technology

Digital signage: where to start

Blockdaemon among the founders of the BSN Spartan Network governing body

MicroStrategy Buys 6,455 BTC, Repays $205 Million Loan at 22% Discount

Categories

Important Links

Get daily news updates to your inbox!