With the recent introduction of large language models (LLM), the field of artificial intelligence (ai) has significantly eclipsed. Although these models have successfully demonstrated incredible performance in tasks such as content generation and question answering, there are still certain challenges in answering complicated and open-ended queries that require interaction with other tools or APIs.
Outcome-based systems, where feedback is easily obtained, are effective for simpler tasks, while, for more complex problems, a process monitoring approach is useful, which involves defining workflows through task decompositions. understandable to humans. These workflows, called LLM agents, use external tools or APIs to carry out multi-step processes to achieve a purpose. The sample task considered is to answer complicated queries by collecting data and crafting a one-paragraph response using a search API.
Existing models that can answer complex questions in natural language that require multi-step reasoning and the integration of external information encounter failures due to the non-differentiable nature of interactions with external knowledge and also because it is not possible to train them end-to-end. another to correct these errors. simple.
To address these challenges, a team of Google researchers suggested developing a ReAct-style LLM agent that can think and act in response to external information. Due to its ability to handle multi-step procedures, the ReAct-style agent can efficiently respond to complex queries.
The team has come up with a ReST-like technique to further improve performance and handle failure scenarios. This technique uses an increasing batch reinforcement learning strategy with ai feedback, allowing for iterative training on previous trajectories. The primary goal is to allow the agent to continually develop and distill over time.
The team shared that a compact fitted model was obtained after just two algorithm runs, from a suggested large model. Despite having two orders of magnitude and fewer parameters, the smaller model was able to demonstrate comparable performance on difficult composition and question answering benchmarks.
The team has summarized its main contributions as follows.
- A ReAct-style self-critical agent has been introduced intended for extended responses to questions.
- A proxy evaluation metric for self-evaluation has been proposed for the agent using the Bamboogle and BamTwoogle datasets.
- Improved performance of the agent has been demonstrated by iteratively adjusting its lines of reasoning in ReST form.
- Gradual ai feedback has been used to improve the agent, eliminating the need to train data with human labels.
- It has been shown that the agent can be effectively reduced to one or two orders of magnitude smaller using the synthetic data produced during this iterative process, while maintaining performance close to that of the instructor agent that had been previously trained.
In conclusion, this approach combines an iterative training technique, ReST, with an LLM agent designed in the ReAct style. By incorporating external knowledge and extensive model tuning with reduced parameterization, this combination can definitely overcome the challenges of answering difficult questions and ultimately improve performance on demanding benchmarks.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to join. our 34k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you'll love our newsletter.
Tanya Malhotra is a final year student of University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with specialization in artificial intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with a burning interest in acquiring new skills, leading groups and managing work in an organized manner.
<!– ai CONTENT END 2 –>