Native RL APIs in vLLM
vLLM introduces native Reinforcement Learning APIs to standardize weight synchronization and improve asynchronous training support, addressing key pain points of framework fragmentation and fragile deployments in online RL for large models.
vLLM Blog · May 28, 2026
Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL
Hugging Face's TRL library introduces delta weight sync, transmitting only the ~1-2% of weights that change between RL steps, reducing sync overhead by two orders of magnitude and making trillion-parameter async RL training dramatically cheaper.
Hugging Face Blog · May 27, 2026
Announcing VeRL-Omni: Easy, Fast, and Stable RL Training for Diffusion and Omni-Modality Models
VeRL-Omni is a reinforcement learning training framework designed for multimodal generative models, addressing the engineering challenges of efficient and stable RL training on diffusion and omni-modality models, extending the LLM RL training paradigm to image, video, and audio generation.
vLLM Blog · May 14, 2026
vLLM V0 to V1: Correctness Before Corrections in RL
ServiceNow AI discovered that subtle differences in vLLM V1's inference engine could crash RL training, and restored stability by fixing four critical backend issues.
Hugging Face Blog · May 7, 2026
Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents
This work extends reinforcement learning environments from logic puzzles to e-commerce conversations, using 8 algorithmically verifiable scenarios to train AI agents from 'chatting well' to 'getting things done'.
Hugging Face Blog · Apr 16, 2026
ChatGPT voice mode is a weaker model
Simon Willison reveals a counterintuitive fact: ChatGPT's voice mode runs on an older, weaker GPT-4o-era model, creating a massive gap between user expectations and reality.
Simon Willison · Apr 10, 2026
Reward Hacking in Reinforcement Learning
A comprehensive analysis of reward hacking in RL, covering causes, real-world examples, and mitigation strategies with special focus on RLHF for LLMs.
Lil'Log · Apr 5, 2026
Reward Hacking in Reinforcement Learning
Reward hacking presents challenges in reinforcement learning due to flaws in reward functions, particularly impacting language models, necessitating further research and mitigation strategies.
Lilian Weng · Nov 28, 2024
The Transformer Family Version 2.0
Lilian Weng's new article deeply explores the evolution and new features of Transformers, revealing their ongoing impact in natural language processing.
Lilian Weng · Jan 27, 2023