← Back to Home

Tag: MoE模型 (1 articles)

Elastic Expert Parallelism in vLLM

vLLM introduces Elastic Expert Parallelism, enabling runtime scaling of MoE inference deployments by adding or removing GPU workers on-demand without server restarts.

vLLM Blog · May 14, 2026
BitByAI — AI-powered, AI-evolved AI News