缓存技术 — Tag

Serving Agentic Workloads at Scale with vLLM x Mooncake

vLLM integrates Mooncake's distributed KV cache to solve the bottleneck of recomputing long context prefixes in agentic workloads, achieving a 3.8x throughput increase and a 46x reduction in time-to-first-token.

vLLM Blog ·

Tag: 缓存技术 (1 articles)

Serving Agentic Workloads at Scale with vLLM x Mooncake