Recurrent editor for fast speculative decoding on large language models

Introducing Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art acceleration for large language model (LLM) inference. The performance gains are driven by three key aspects: (1) leveraging a recurrent neural network (RNN) as a preliminary model conditioned on the hidden states of LLM, (2) applying a dynamic tree attention algorithm on the search results of beams to eliminate duplicate prefixes in candidate sequences, and (3) training through distillation of knowledge from the LLM. ReDrafter speeds up Vicuna inference on MT-Bench up to 3.5x with a PyTorch implementation on Nvidia H100 GPU. To demonstrate its practicality in production environments, we integrated ReDrafter into TensorRT-LLM, achieving up to 2.5x speedup on H100 GPUs. We also validated its effectiveness for on-device applications by implementing the approach in MLX and comparing performance on metallic GPUs on Apple Silicon chips, achieving up to 2.3x speedup.

Recurrent editor for fast speculative decoding on large language models

Technical Terrence Team

Video: Royal Caribbean ship saves family and dog in daring rescue

Leave a Reply Cancel reply

Recommended.

What is a 457 plan?

Bitcoin Ready for New All-Time High, But Analyst Says It Must Break This Level First

This is one of my favorite FTSE 100 value stocks right now

Music Mogul tunes in to WAX with DappRadar Quest

Silvergate Bank Announces Voluntary Liquidation as Crypto Industry Woes Persist – Bitcoin News

Categories

Important Links

Recurrent editor for fast speculative decoding on large language models

Related

Technical Terrence Team

Video: Royal Caribbean ship saves family and dog in daring rescue

Leave a Reply Cancel reply

Recommended.

What is a 457 plan?

Bitcoin Ready for New All-Time High, But Analyst Says It Must Break This Level First

This is one of my favorite FTSE 100 value stocks right now

Music Mogul tunes in to WAX with DappRadar Quest

Silvergate Bank Announces Voluntary Liquidation as Crypto Industry Woes Persist – Bitcoin News

Categories

Important Links

Get daily news updates to your inbox!