Large language models (LLMs) have made significant strides in recent years, such as GPT-3, Codex, PaLM, LLaMA, ChatGPT, and the more current GPT4. The potential of LLMs is being pushed closer and closer toward Artificial General Intelligence thanks to these models’ outstanding performance in in-context learning, code generation, and various other NLP tasks. Despite these impressive accomplishments, the current LLMs have drawbacks, such as the inability to recognize or react to present information, frequent failures to provide precise and understandable mathematical solutions, and instability in reasoning across a lengthy chain of logic. A line of study has been motivated to provide LLMs with external tools to lessen their memorizing burden and improve their competence in solving these problems. For instance, including tools like a web search engine or question-and-answer (QA) system enables LLMs to learn when and how to use external resources for problem-solving. Additional external LLM tools are also used in recent research, including GitHub resources, neural network models (like the Huggingface module), and code interpreters (like the Python interpreter). LLMs must provide extensive blueprints before using these techniques to solve complicated problems.
The tool-augmented LLMs still face several difficulties, nevertheless, and they pay particular attention to the following areas: (1) While the variety of potential innovative tasks remains essentially limitless, most current work concentrates on a small number of tools. As a result, it might be difficult to locate an existing tool appropriate for solving a new problem. (2) Language models’ current approach to deducing how to use tools most effectively is inherently complicated. The entire task-handling process involves extensive planning, which places a heavy cognitive strain on the models and necessitates a high learning cost. (3) After receiving execution results, the tool-use pipelines lack a defined and automated error-handling mechanism. The accuracy and robustness of the framework still want more development. In this work, researchers from Tsinghua University and the University of Illinois (UC) intend to approach these obstacles from a fresh perspective: they empower the LLMs to be the developers of tools and solve problems with more accuracy and flexibility. Rather than letting the LLMs serve as the consumers of tools.
As a result, they introduce CREATOR, their tool development framework, which uses LLMs’ capacity to develop tools and make corrections depending on existing parameters before addressing a particular problem. They demonstrate pipeline variations between CREATOR and a typical tool-using framework in Figure 1. The tool-using framework focuses on how to use reasonings to choose and plan the use of APIs more effectively. In contrast, their focus is on diversifying the toolset, decoupling various levels of rationale, and improving the framework’s resilience and correctness.
CREATOR may be broken down into four steps:
• Creation: Utilising LLM’s capacity for abstract reasoning based on the problem, create tools broadly applicable through documentation and code realization.
• Decision: Choosing when and how to apply the tool using appropriate tools.
• Implementation: Run the program where the LLM uses the tool to address the issue.
• Rectification: Based on the outcomes of execution, alter the instruments and choices.
They initially run tests on CREATOR utilizing MATH and TabMWP as two existing benchmarks to see how successful their design is. While TabMWP offers various tabular settings for problem-solving, the MATH dataset contains difficult and varied math competition challenges. ‘Notably, ChatGPT built on CREATOR outperforms the traditional chain-of-thought (CoT), program-of-thought (PoT), and tool-using baselines by considerable margins, achieving an average accuracy of 59.7% and 94.7%, respectively, on the MATH and TabMWP dataset.
They additionally propose the Creation Challenge dataset, which consists of innovative and tough challenges that need to be answered with existing tools or code packages, as existing benchmarks are not specially designed to evaluate tool creation. Using this dataset, they demonstrate the value and use of LLMs’ tool-building capabilities. Additionally, they offer experimental findings and case studies that show how tool development encourages knowledge transfer and that LLMs have varying degrees of tool production proficiency that enable them to adapt more effectively to various issue contexts.
Check Out The Paper. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]
🚀 Check Out 100’s AI Tools in AI Tools Club
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.