Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents
IBM and HuggingFace introduce the VAKRA benchmark, revealing that current AI agents perform poorly on complex multi-step tasks, with key failure modes including tool-chain planning, parameter passing, and error recovery.
Hugging Face Blog · Apr 15, 2026