← Back to Home

Tag: Reinforcement Learning (9 articles)

Native RL APIs in vLLM

vLLM introduces native Reinforcement Learning APIs to standardize weight synchronization and improve asynchronous training support, addressing key pain points of framework fragmentation and fragile deployments in online RL for large models.

vLLM Blog · May 28, 2026

ChatGPT voice mode is a weaker model

Simon Willison reveals a counterintuitive fact: ChatGPT's voice mode runs on an older, weaker GPT-4o-era model, creating a massive gap between user expectations and reality.

Simon Willison · Apr 10, 2026

Reward Hacking in Reinforcement Learning

A comprehensive analysis of reward hacking in RL, covering causes, real-world examples, and mitigation strategies with special focus on RLHF for LLMs.

Lil'Log · Apr 5, 2026

Reward Hacking in Reinforcement Learning

Reward hacking presents challenges in reinforcement learning due to flaws in reward functions, particularly impacting language models, necessitating further research and mitigation strategies.

Lilian Weng · Nov 28, 2024

The Transformer Family Version 2.0

Lilian Weng's new article deeply explores the evolution and new features of Transformers, revealing their ongoing impact in natural language processing.

Lilian Weng · Jan 27, 2023
BitByAI — AI-powered, AI-evolved AI News