This AI paper proposes TALE: an AI framework that reduces token redundancy in chain-of-thought (CoT) reasoning by incorporating token budget knowledge

Large language models (LLMs) have shown significant potential in reasoning tasks, using methods such as the chain of thought (CoT) to break down complex problems into manageable steps. However, this capability comes with challenges. CoT prompts often increase token usage, resulting in higher computational costs and energy consumption. This inefficiency is a concern for applications that require precision and resource efficiency. Current LLMs tend to generate unnecessarily long results, which do not always translate into greater accuracy but generate additional costs. The key challenge is to find a balance between reasoning performance and resource efficiency.

Researchers from Nanjing University, Rutgers University, and UMass Amherst have introduced an LLM reasoning framework based on token budgeting. This framework dynamically estimates symbolic budgets based on the complexity of a reasoning task and uses these estimates to guide the process. Known as TALE (Token-Budget-Aware LLM rEasoning), the approach seeks to reduce the use of tokens without compromising the accuracy of responses. By integrating a token budget into CoT prompts, TALE provides a practical solution to improve profitability in LLMs while maintaining their performance.

Technical details and benefits

TALE operates in two main phases: budget estimation and conscious symbolic budget reasoning. Initially, it estimates an appropriate symbolic budget for a problem using methods such as zero-shot prediction or regression-based estimators. This quote is then integrated into the message to encourage the LLM to generate concise but accurate responses.

A key innovation in TALE is the concept of “token elasticity,” which identifies an optimal range of token budgets that minimizes token usage and preserves accuracy. Using iterative search techniques such as binary search, TALE determines the optimal budget for various LLM tasks and architectures. On average, the framework achieves a 68.64% reduction in token usage with less than a 5% decrease in accuracy, making it a practical and adaptable approach to token efficiency.

Results and insights

Experiments demonstrate the effectiveness of TALE on benchmarks such as GSM8K and MathBench. For example, on the GSM8K dataset, TALE achieved an accuracy of 84.46%, outperforming the vanilla CoT method and reducing token costs from 318.10 to 77.26 on average. On GSM8K-Zero, it reduced token costs by 91% while maintaining 98.72% accuracy.

TALE also generalizes well to different LLMs, such as GPT-4o-mini and Yi-lightning. When applied to the MathBench-College dataset, TALE reduced token costs by up to 70% while maintaining competitive accuracy. Additionally, the framework significantly reduces operational expenses, cutting costs by 59% on average compared to Vanilla CoT. These results highlight TALE's ability to improve efficiency without sacrificing performance, making it suitable for a variety of applications.

Conclusion

The Token-Budget-Aware LLM reasoning framework addresses the inefficiency of using tokens in reasoning tasks. By dynamically estimating and applying token budgets, TALE strikes a balance between accuracy and cost-effectiveness. This approach reduces computational overhead and expands the accessibility of advanced LLM capabilities. As ai continues to evolve, frameworks like TALE offer a path toward more efficient and sustainable use of LLMs in both academic and industrial contexts.

Verify he Paper and GitHub page. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on <a target="_blank" href="https://twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. Don't forget to join our SubReddit over 60,000 ml.

Trending: LG ai Research launches EXAONE 3.5 – three frontier-level bilingual open-source ai models that deliver unmatched instruction following and broad context understanding for global leadership in generative ai excellence….

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. Their most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.

(Download) Large Language Model Vulnerability Assessment Report (Promoted)

This AI paper proposes TALE: an AI framework that reduces token redundancy in chain-of-thought (CoT) reasoning by incorporating token budget knowledge

Technical Terrence Team

Saudi Arabia stock markets close higher; Tadawul All Shares Up 0.28% By Investing.com

Leave a Reply Cancel reply

Recommended.

On a neural implementation of Brenier polar factorization

Sam Altman is said to be discussing a return to OpenAI with the company’s board of directors

These UK stocks are as cheap as chips for passive income

Microsoft introduces Phi Silica: a 3.3 billion-parameter AI model that transforms efficiency and performance in personal computing

This AI Paper Proposes Blending-NeRF that Consists of Pretrained NeRF and Editable NeRF for Text-Driven Localized 3D Object Editing

Categories

Important Links

This AI paper proposes TALE: an AI framework that reduces token redundancy in chain-of-thought (CoT) reasoning by incorporating token budget knowledge

Technical details and benefits

Results and insights

Conclusion

Related

Technical Terrence Team

Saudi Arabia stock markets close higher; Tadawul All Shares Up 0.28% By Investing.com

Leave a Reply Cancel reply

Recommended.

On a neural implementation of Brenier polar factorization

Sam Altman is said to be discussing a return to OpenAI with the company’s board of directors

These UK stocks are as cheap as chips for passive income

Microsoft introduces Phi Silica: a 3.3 billion-parameter AI model that transforms efficiency and performance in personal computing

This AI Paper Proposes Blending-NeRF that Consists of Pretrained NeRF and Editable NeRF for Text-Driven Localized 3D Object Editing

Categories

Important Links

Get daily news updates to your inbox!