Extending LLM Agents: Unlock Improved Performance Through Simplicity

While large language models (LLMs) excel in many areas, they can struggle with complex tasks that require precise reasoning. Recent solutions often focus on sophisticated ensemble methods or frameworks in which multiple LLM agents collaborate. These approaches certainly improve performance, but add layers of complexity. However, what if a simpler strategy could generate significant profits?

This work investigates a fascinating phenomenon: the potential to improve LLM performance simply by increasing the number of agents used. It introduces a remarkably simple method (sampling and voting) that involves generating multiple results from the LLMs and using majority voting to decide the final answer. Let's delve into the details.

The sampling and voting method

At its core, the sampling and voting method is surprisingly simple and consists of two phases (see Figure 2):

Sampling: The task query is repeatedly fed into an LLM (or a framework with multiple LLM agents), generating multiple results (samples).
Vote: Majority voting determines the final answer. For closed-ended tasks (e.g., multiple choice), this involves counting the frequency of each option. For open tasks (e.g., code generation), similarity measures such as the BLEU score are used to classify the samples. The sample with the greatest similarity to the others wins.

This process (Algorithm 1) is elegantly agnostic, making it a powerful complement to enhance existing LLM techniques.

The effectiveness of the method is comprehensively evaluated in the following three tasks:

Arithmetic reasoning: GSM8K and the challenging MATH data set
General reasoning: MMLU and a chess state monitoring task
GENERATION Code: Human evaluation data set

To explore the range of benefits, the authors tested language models of different scales, including Llama2, GPT-3.5-Turbo, and GPT-4.

To test how well the method works with other methods, it was combined with various techniques:

Immediate engineering: Integration with Chain-of-Thought (CoT), Zero-Shot Cot and Solo Performance Prompting.
Collaboration of multiple LLM agents: It is used in conjunction with debate style methods (LLM-Debate) and self-reflection.

The results offer compelling information:

Performance Scaling: Increasing the number of agents generally improves LLM performance on tasks and models of different sizes. Surprisingly, smaller LLMs, when enlarged, often rival or surpass their larger counterparts (Fig. 1).
Compatibility: The method combines seamlessly with other techniques, resulting in even greater performance gains.
Simplicity versus complexity: In most cases, the proposed method alone achieves results on par with more complex approaches, suggesting power in its simple design.

Extensive experiments demonstrate the consistency of the method across all hyperparameters (Fig. 4) and reveal a key point: performance improvements are positively correlated with task difficulty (Table 5). To analyze this relationship, three dimensions of difficulty are isolated:

Inherent difficulty: Profits first increase and then decrease as the problems become extremely complex.
Number of steps: The gains become more pronounced as the steps required to solve the task increase.
Prior probability: Performance improves when the probability of a correct response is greater.

These findings inspired optimizations such as stepwise or hierarchical sampling and voting, maximizing gains through a nuanced understanding of task difficulty.

In conclusion, this work sets a new benchmark, demonstrating that sometimes “more agents” may be all that is needed. In many cases, scaling LLM agents with a simple sampling and voting strategy significantly improves performance without complex methods. This discovery simplifies complex LLM applications and paves the way for cost optimization of future systems, a focus of ongoing research.

Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on Twitter and Google news. Join our 38k+ ML SubReddit, 41k+ Facebook community, Discord channeland LinkedIn Grabove.

If you like our work, you will love our Newsletter..

Don't forget to join our Telegram channel

You may also like our FREE ai Courses….

Vineet Kumar is a Consulting Intern at MarktechPost. She is currently pursuing her bachelor's degree from the Indian Institute of technology (IIT), Kanpur. He is a machine learning enthusiast. He is passionate about research and the latest advances in Deep Learning, Computer Vision and related fields.

<!– ai CONTENT END 2 –>

LLMWare Releases SLIM: Small Specialized Function Call Models for Multi-Step Automation (See All Models)