vLLM x Novita AI: PegaFlow for Production-Grade External KV Cache
vLLM and Novita AI collaborate on PegaFlow, externalizing the KV cache into a standalone service with a three-level cache hierarchy, achieving doubled startup speed and significantly higher throughput.
vLLM Blog · May 18, 2026
vLLM V0 to V1: Correctness Before Corrections in RL
ServiceNow AI discovered that subtle differences in vLLM V1's inference engine could crash RL training, and restored stability by fixing four critical backend issues.
Hugging Face Blog · May 7, 2026
The State of FP8 KV-Cache and Attention Quantization in vLLM
vLLM's comprehensive testing reveals that FP8 KV-cache quantization can significantly reduce memory usage and decoding costs under specific conditions, but introduces critical accuracy and performance pitfalls in certain models and scenarios, requiring careful adoption.
vLLM Blog · Apr 22, 2026