Duo-LLM: A Framework for Studying Adaptive Computing in Large Language Models

This paper was accepted into the Efficient Natural Speech and Language Processing (ENLSP) Workshop at NeurIPS 2024.

Large language models (LLMs) typically generate results token by token using a fixed computing budget, leading to inefficient resource utilization. To address this shortcoming, recent advances in combining expert modeling (MoE), speculative decoding, and early exit strategies leverage the idea that computational demands can vary significantly depending on the complexity and nature of the input. However, identifying optimal routing patterns for dynamic execution remains an open challenge, limiting the full potential of these adaptive methods. To address this need, we study adaptive computing in LLM more systematically. We propose a novel framework that integrates smaller auxiliary modules within each Feed-Forward network layer of the LLM. This design allows for dynamic routing of tokens depending on the complexity of the task: tokens can be processed by small or large modules at each layer, or even skip certain layers entirely. This allows us to introduce a novel notion of a token's difficulty, defined by its potential to benefit from additional computational resources. Importantly, by employing oracles to identify optimal patterns of adaptive computations, we gain valuable insights into the inner workings of LLMs and routing processes in a simplified heterogeneous MoE setup. We show that trained routers operate differently than oracles and often produce suboptimal solutions. In particular, activating a large module in a single layer outperforms models using large modules in all layers, underscoring the gap between practical implementations of routing in MoE models and theoretical optima for adaptive computation.

Duo-LLM: A Framework for Studying Adaptive Computing in Large Language Models

Technical Terrence Team

1 stock ready to break into the FTSE 100 in 2025!

Leave a Reply Cancel reply

Recommended.

US Feds Seize Nearly $700M Of Sam Bankman-Fried Assets: Report

US SEC says JPMorgan Chase settles five enforcement cases and will pay $151 million By Reuters

Ford to build a battery factory in the US with technology from China

Where should sales be located in product-driven companies? • TechCrunch

Why the FTSE 100 Index could hit 39,000 by 2037

Categories

Important Links

Duo-LLM: A Framework for Studying Adaptive Computing in Large Language Models

Related

Technical Terrence Team

1 stock ready to break into the FTSE 100 in 2025!

Leave a Reply Cancel reply

Recommended.

US Feds Seize Nearly $700M Of Sam Bankman-Fried Assets: Report

US SEC says JPMorgan Chase settles five enforcement cases and will pay $151 million By Reuters

Ford to build a battery factory in the US with technology from China

Where should sales be located in product-driven companies? • TechCrunch

Why the FTSE 100 Index could hit 39,000 by 2037

Categories

Important Links

Get daily news updates to your inbox!