Tag: 分布式训练 (3 articles)

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

NVIDIA NeMo AutoModel seamlessly plugs into the HuggingFace ecosystem, boosting MoE fine-tuning throughput by 3.4x-3.7x and cutting VRAM usage by 30% with a single import line change.

Hugging Face Blog · Jun 25, 2026

Native RL APIs in vLLM

vLLM introduces native Reinforcement Learning APIs to standardize weight synchronization and improve asynchronous training support, addressing key pain points of framework fragmentation and fragile deployments in online RL for large models.

vLLM Blog · May 28, 2026

Building Blocks for Foundation Model Training and Inference on AWS

AWS details the infrastructure supporting the full foundation model lifecycle from pre-training and post-training to inference, revealing a paradigm shift from a single scaling law to three, and the deep integration trend of open-source software stacks with cloud infrastructure.

Hugging Face Blog ·