Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor
Poolside's 33B-parameter agentic coding model, Laguna XS.2, achieves 2-3x inference speedup without quality loss through native vLLM integration, DFlash speculative decoding, and LLM Compressor quantization.
vLLM Blog · May 28, 2026
The State of FP8 KV-Cache and Attention Quantization in vLLM
vLLM's comprehensive testing reveals that FP8 KV-cache quantization can significantly reduce memory usage and decoding costs under specific conditions, but introduces critical accuracy and performance pitfalls in certain models and scenarios, requiring careful adoption.
vLLM Blog · Apr 22, 2026