Large language models can quickly adapt to new tasks using in-context learning by receiving some demonstrations and instructions in real languages. This avoids hosting the LLM or annotating large data sets, but has significant performance issues with multi-step reasoning, math, having the latest information, and whatnot. Recent research suggests giving LLMs access to tools to facilitate more sophisticated stages of reasoning or challenging them to emulate a chain of reasoning for multi-step reasoning to alleviate these limitations. However, it is a challenge to adapt established approaches for a reason chained with the use of tools to new activities and tools; this requires a set-up or early specialized engineering for a certain activity or tool.
In this study, researchers from the University of Washington, Microsoft, Meta, the University of California, and the Allen Institute for AI Research develop the Automated Reasoning and Tool use (ART) framework, which automatically creates decompositions (multi-step reasoning) for Examples of new tasks. . ART pulls examples of similar tasks from a task library to allow a breakdown of a few shots and the use of tools for further work. These examples use a flexible but structured query language that makes it easy to read intermediate stages, pause the build to use external tools, and restart it once the output from those tools has been included (Figure 1). Furthermore, the framework chooses and employs the best suitable tools (such as search engines and code execution) at each stage.
The LLM receives demonstrations from ART on how to break down instances of various related activities and how to choose and use any tool from the library of tools represented in these examples. This helps the model generalize from examples to break down new tasks and use the right tools for the job, zero shots. In addition, users can update the task and tool libraries and add recent examples as needed to fix any errors in the logic chain or add new tools (for example, for the task at hand).
They create a task library for 15 BigBench tasks and test ART on 19 previously unseen BigBench test tasks, 6 MMLU tasks, and numerous relevant tool usage research tasks (SQUAD, TriviaQA, SVAMP, MAWPS) . For 32 of the 34 BigBench problems and all MMLU tasks, ART regularly equals or exceeds computer-generated CoT chains of reasoning, on average, by more than 22 percentage points. When the tools are allowed, performance on test tasks increases by an average of about 12.3 percentage points compared to when they are not.
On average, ART outperforms direct indications of few shots on BigBench and MMLU tasks by 10.8% percentage points. ART outperforms direct indications of few shots on unseen tasks requiring mathematical and algorithmic reasoning by 12.5% and outperforms the best-known GPT3 findings, including monitoring decomposition and tool use, by 6.1 % of percentage points. Updating the task and tool libraries with new examples enables human interaction and improved thinking processes, making it incredibly simple to increase performance on any job with minimal human intervention. Across 12 test tasks, ART outperforms the best-known GPT3 results by an average of more than 20% when given additional human feedback.
review the Paper and project page. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 16k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.