← Back to Home

Tag: 分布式系统 (2 articles)

Serving Agentic Workloads at Scale with vLLM x Mooncake

By integrating Mooncake's distributed KV cache store, vLLM overcomes the efficiency bottleneck of recomputing long-context prefixes in AI Agent workloads, achieving a 3.8x throughput increase and 46x lower time-to-first-token.

vLLM Blog · May 6, 2026
BitByAI — AI-powered, AI-evolved AI News