量化 — Tag | BitByAI

Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor

Poolside's 33B-parameter agentic coding model, Laguna XS.2, achieves 2-3x inference speedup without quality loss through native vLLM integration, DFlash speculative decoding, and LLM Compressor quantization.

vLLM Blog · May 28, 2026

Tag: 量化 (1 articles)

Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor