Reinforcement learning for interactive LLM agents of Horizon Long

Interactive digital agents (IDA) take advantage of digital environments APIs to perform tasks in response to user applications. While the IDA driven by large language models (LLMS) adjusted by instructions can react to the feedback of interface invocations in multiple steps exchanges, they have not been trained in their respective digital environments. The above methods achieve less than half of the tasks at sophisticated reference points such as Appworld. We present a reinforcement learning approach (RL) that trains directly in its target environments. We formalize this training as a partially observable Markov decision process and derive the loop, an efficient data and memory of the optimization of proximal policy. Loop uses non Value Network and maintains exactly a copy of the underlying LLM in memory, which makes its implementation direct and as efficient in memory as a single LLM adjustment. An agent of 32 billion parameters trained with loop in the APPWORLD environment exceeds the OpenAi O1 much larger agent in 9 percentage points (15% relative). As far as we know, this is the first report reported by first RL that interact with an environment of multiple multiple domain applications through direct calls of API. Our analysis sheds light on the effectiveness of RL in this area, which shows that the agent learns to consult the documentation of the API, avoid unjustified assumptions, minimize the conspression and recover from the setbacks.

Reinforcement learning for interactive LLM agents of Horizon Long

Technical Terrence Team

Analysts rework the objectives of Google's shares after earning shock

Leave a Reply Cancel reply

Recommended.

PBW 2023 explores the current state of the blockchain space

AI document processing: The complete guide

Robinhood launches contracts to bet on the US presidential election By Reuters

Royal Caribbean adds free service to address the headache of the cruise port

Flic is ready to control all your Matter devices

Categories

Important Links

Reinforcement learning for interactive LLM agents of Horizon Long

Related

Technical Terrence Team

Analysts rework the objectives of Google's shares after earning shock

Leave a Reply Cancel reply

Recommended.

PBW 2023 explores the current state of the blockchain space

AI document processing: The complete guide

Robinhood launches contracts to bet on the US presidential election By Reuters

Royal Caribbean adds free service to address the headache of the cruise port

Flic is ready to control all your Matter devices

Categories

Important Links

Get daily news updates to your inbox!