Speculators v0.5.0: DFlash Support and Online Training
The Speculators v0.5.0 release introduces the DFlash algorithm for speculative decoding, which generates draft tokens in a single forward pass, significantly reducing inference latency, and unifies online and offline training workflows.
vLLM Blog ·