Recurrent editor for fast speculative decoding on large language models
Introducing Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art acceleration for large language model (LLM) inference. The ...