← Back to Home

Tag: 训练框架 (1 articles)

Speculators v0.5.0: DFlash Support and Online Training

The Speculators v0.5.0 release introduces the DFlash algorithm for speculative decoding, which generates draft tokens in a single forward pass, significantly reducing inference latency, and unifies online and offline training workflows.

vLLM Blog ·