Tag: Agent框架 (2 articles)

Serving Agentic Workloads at Scale with vLLM x Mooncake

By integrating Mooncake's distributed KV cache store, vLLM overcomes the efficiency bottleneck of recomputing long-context prefixes in AI Agent workloads, achieving a 3.8x throughput increase and 46x lower time-to-first-token.

vLLM Blog · May 6, 2026

Your harness, your memory

LangChain CEO argues that agent harnesses are inextricably tied to memory, and using a closed harness means ceding control of your memory to a third party, creating significant lock-in.

LangChain Blog · Apr 11, 2026