模型量化 — Tag

TML Inkling on vLLM: Day-0 Support with Optimized Performance

vLLM provides day-0 support for TML Inkling, achieving 380 tok/s on 4 GB200 GPUs with full feature parity, 1M context, and multimodal input.

vLLM Blog · Jul 15, 2026