EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec
EAGLE 3.1 addresses the performance degradation of speculative decoding in long-context and varied chat templates by introducing FC normalization and post-norm design, doubling acceptance length in long-context scenarios and significantly improving the robustness and practicality of inference acceleration.
vLLM Blog ·