Recent advances in large language models (LLMs) have boosted the field of instruction interpretation and execution. Despite these advances, LLMs still face errors in remembering and composing world knowledge, leading to inaccuracies in answers. To address this, the integration of auxiliary tools, such as the use of search engines or calculators during inference, has been proposed to improve reasoning. However, existing LLMs with enhanced tools face challenges in efficiently leveraging tools for multi-step reasoning, particularly in handling interleaved tool calls and minimizing inference wait times.
In response to these challenges, this EPFL and Meta research presents the Chain of abstraction (CoA) Reasoning Method, a robust and efficient approach for LLMs to perform multi-step reasoning with tools. The core idea is illustrated in Figure 1, where LLMs are fine-tuned to create chains of reasoning with abstract placeholders (e.g., y1, y2, y3). These placeholders are then replaced with specific knowledge obtained from external tools, such as calculators or web search engines, cementing the final answer generations.
Furthermore, unlike previous methods where LLM decoding and API calls are intertwined, CoA reasoning promotes effective planning by encouraging LLMs to interconnect multiple tool calls and adopt more feasible reasoning strategies. The abstract chain of reasoning allows LLMs to focus on general and holistic reasoning strategies without generating instance-specific knowledge for model parameters. In particular, decoupling general reasoning and domain-specific knowledge enables parallel processing, where LLMs can generate the next abstract chain while tools fill the current chain, thus speeding up the overall inference process.
To train LLMs for CoA reasoning, the authors construct fitting data by reusing existing open source question answering datasets (Cobbe et al., 2021; Miao et al., 2020; Yang et al., 2018) . LLaMa-70B is requested to rewrite responses as abstract strings, replacing specific operations with abstract placeholders. The resulting CoA traces are validated using domain-specialized tools to ensure accuracy.
The CoA method is evaluated in two domains: mathematical reasoning and answering Wikipedia questions (Wiki QA). For mathematical reasoning, LLMs are trained with CoA data constructed by rewriting the GSM8K training set (Cobbe et al., 2021). CoA outperforms regular and few-shot fine-tuning baselines on both in- and out-of-distribution data sets, demonstrating its effectiveness in multi-step reasoning tasks. The CoA method also demonstrates superior performance compared to the Toolformer baseline.
In the Wiki QA domain, HotpotQA (Yang et al., 2018) is used to construct fine-tuning CoA data. CoA outperforms baselines including Toolformer and achieves remarkable generalization ability across diverse question answering datasets (WebQuestions, NaturalQuestions, TriviaQA). Domain tools, such as a Wikipedia search engine and a named entity recognition toolkit, further improve CoA performance.
Evaluation results in both domains indicate significant improvements with the CoA method, yielding an average accuracy increase of approximately 7.5% and 4.5% for mathematical reasoning and Wiki QA, respectively. These improvements are maintained across test sets within and outside the distribution, particularly benefiting questions that require complex chain-of-thought reasoning. CoA also shows faster inference speeds, outperforming previous augmentation methods in mathematical reasoning and Wiki QA tasks.
In conclusion, the proposed CoA reasoning method separates general reasoning from domain-specific knowledge, promoting more robust multi-step reasoning in LLMs. Its efficiency in tool usage contributes to faster inference, making it a promising approach for various reasoning scenarios. Experiments on mathematical reasoning and Wiki QA underline the versatility and effectiveness of the CoA method, suggesting its potential for broader applications to improve LLM performance in various domains.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on Twitter and Google news. Join our 36k+ ML SubReddit, 41k+ Facebook community, Discord channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our Telegram channel
Vineet Kumar is a Consulting Intern at MarktechPost. She is currently pursuing her bachelor's degree from the Indian Institute of technology (IIT), Kanpur. He is a machine learning enthusiast. He is passionate about research and the latest advances in Deep Learning, Computer Vision and related fields.
<!– ai CONTENT END 2 –>