← Back to Home

Tag: AI推理 (2 articles)

Serving Agentic Workloads at Scale with vLLM x Mooncake

By integrating Mooncake's distributed KV cache store, vLLM overcomes the efficiency bottleneck of recomputing long-context prefixes in AI Agent workloads, achieving a 3.8x throughput increase and 46x lower time-to-first-token.

vLLM Blog · May 6, 2026

DeepInfra on Hugging Face Inference Providers 🔥

Hugging Face integrates the cost-effective inference platform DeepInfra into its Inference Providers ecosystem, offering developers more model choices, flexible billing, and a unified API.

Hugging Face Blog · Apr 29, 2026
BitByAI — AI-powered, AI-evolved AI News