Reward Hacking in Reinforcement Learning
A comprehensive analysis of reward hacking in RL, covering causes, real-world examples, and mitigation strategies with special focus on RLHF for LLMs.
Lil'Log · Apr 5, 2026
A comprehensive analysis of reward hacking in RL, covering causes, real-world examples, and mitigation strategies with special focus on RLHF for LLMs.