Big language models are getting better with every new development in the artificial intelligence industry. With each modification and version, LLMs are becoming more capable of meeting different requirements in applications and scenarios. Recently released ChatGPT, developed by OpenAI, which works on the GPT transformer architecture, is one of the most popular LLMs. With the latest GPT-4 architecture, ChatGPT now even works well with multimodal data.
The goal of AI has always been to develop models and techniques that help automate repetitive tasks and solve complex problems by imitating humans. Although LLMs successfully manipulate text when performing computing tasks by performing keyboard and mouse actions, they face some challenges. These challenges include ensuring that the generated actions are appropriate for the given task, feasible in the current state of the agent, and executable. These three challenges are known as task grounding, state grounding, and agent grounding.
A new study has introduced an approach called Recursive Criticism and Improvement (RCI), which uses a pretrained LLM agent to execute natural language-guided computing tasks. RCI uses an application scheme that asks the LLM to generate an output. This is followed by identifying problems with the output, and therefore generating updated output.
RCI improves the three challenges of the previous approaches, i.e., task grounding, state grounding, and agent grounding, resulting in better performance in executing computing tasks. For computer tasks, the RCI indication is applied in three stages. First, the LLM generates a high-level plan, then it generates an action based on the plan and current state, and finally it formats the action into the right keyboard or mouse action.
Task grounding basically consists of producing a high-level plan based on the task text to ensure that the actions taken by the agent are appropriate for the given task. On the other hand, state grounding connects the high-level concepts derived from the task grounding step to the actual HTML elements present in the current state of the agent, thus ensuring that the actions produced by the agent are feasible. in the current state. Finally, agent grounding ensures that actions generated by the agent are executable and in the correct format.
This new approach can be used in ChatGPT to solve general computing tasks using a keyboard and mouse without the need for plugins. In RCI prompts, the LLM first identifies problems with the original answer and, based on those problems, improvises the answer. A unique feature of this approach is that it only requires a few demos per task, as opposed to existing methods that require thousands of demos per task.
The RCI approach outperforms existing LLM methods for automating computing tasks and outperforms supervised learning and reinforcement learning methods in the MiniWoB++ benchmark. By comparing RCI with Chain-of-Thought (CoT) cues, which is a method recognized for its effectiveness in reasoning tasks, the researchers found a large collaborative impact between the RCI cues and the two CoT baselines. . In conclusion, Recursive Criticism and Improvement (RCI) seems promising for solving complex computing tasks and reasoning problems with LLM.
review the Paper, Github, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 18k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.
🔥 Must Read: What is AI Hallucination? What goes wrong with AI chatbots? How to detect an amazing artificial intelligence?