Decision making and intensive knowledge seeking are two essential skills for large-scale natural language agents in unfamiliar environments. OpenAI’s GPT-3 and Google’s PaLM are just two examples of LLMs that have shown impressive performance in various benchmarks. The human-like abilities of these models to understand tasks in specific environments represent a huge step forward in natural language processing.
Agents can overcome high syntactic barriers that could lead to false negative errors in complex tasks if they are based on natural language. However, due to their large and often unlimited state spaces, natural language RL agents present a significant challenge in learning optimal policies.
Various decision-making approaches have been proposed to help natural language agents make choices in a text-based environment without the benefit of learned policy. However, the model becomes more prone to hallucinating in longer sequences, which reduces the accuracy of these methods as the number of subtasks increases.
Natural language agents can solve tasks more intuitively thanks to the advanced human qualities of large-scale LLMs. Human-in-the-loop (HITL) methods have been widely used to increase performance by redirecting the agent’s reasoning trail after errors. Although this method improves performance with little human input, it is not autonomous because it requires coaches to monitor trajectory at each time interval.
🔥 Best Image Annotation Tools in 2023
Researchers at Northeastern University and the Massachusetts Institute of Technology believe that given the opportunity to independently close the trial-and-error loop, LLMs would make good use of natural language-based self-optimization.
To verify their hypothesis, the team implements a self-reflexive LLM and a simple heuristic to identify hallucinations and ineffective action execution within an LLM-based agent using an approach called reflection. They then put the agent to the test in two different error-learning benchmarks: the text-based AlfWorld and the question-and-answer HotPotQA. As a result, efficiency in decision making and other knowledge-based tasks is increased.
The ReAct problem-solving technique is enhanced by the Reflexion agent’s ability to reflect on its performance, leading to a 97% successful discovery rate on the AlfWorld benchmark in just 12 stand-alone tests. This is a significant improvement over the 75% accuracy achieved by the base ReAct agent. One hundred HotPotQA questions were taken and a Reflexion-based ReAct agent was tested. Compared to a benchmark ReAct agent, the agent outperformed it by 17% by iteratively refining its content search and extraction based on hints from its memory. It’s important to note that Reflexion is not designed to achieve near-perfect accuracy scores; rather, its goal is to show how trial-and-error learning can facilitate discovery in tasks and environments previously thought impossible to solve.
The team notes that their Reflection can be applied to more challenging problems, such as when the agent needs to learn how to generate novel ideas, investigate never-before-seen state spaces, and build more precise action plans based on their history of experiences.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 16k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.