Introducing Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art acceleration for large language model (LLM) inference. The performance gains are driven by three key aspects: (1) leveraging a recurrent neural network (RNN) as a preliminary model conditioned on the hidden states of LLM, (2) applying a dynamic tree attention algorithm on the search results of beams to eliminate duplicate prefixes in candidate sequences, and (3) training through distillation of knowledge from the LLM. ReDrafter speeds up Vicuna inference on MT-Bench up to 3.5x with a PyTorch implementation on Nvidia H100 GPU. To demonstrate its practicality in production environments, we integrated ReDrafter into TensorRT-LLM, achieving up to 2.5x speedup on H100 GPUs. We also validated its effectiveness for on-device applications by implementing the approach in MLX and comparing performance on metallic GPUs on Apple Silicon chips, achieving up to 2.3x speedup.