分布式系统 — Tag

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Hugging Face's TRL library introduces delta weight sync, transmitting only the ~1-2% of weights that change between RL steps, reducing sync overhead by two orders of magnitude and making trillion-parameter async RL training dramatically cheaper.

Hugging Face Blog · May 27, 2026

Serving Agentic Workloads at Scale with vLLM x Mooncake

vLLM integrates Mooncake's distributed KV cache to solve the bottleneck of recomputing long context prefixes in agentic workloads, achieving a 3.8x throughput increase and a 46x reduction in time-to-first-token.

vLLM Blog ·

Tag: 分布式系统 (2 articles)

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Serving Agentic Workloads at Scale with vLLM x Mooncake