失败分析 — Tag

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

IBM and HuggingFace introduce the VAKRA benchmark, revealing that current AI agents perform poorly on complex multi-step tasks, with key failure modes including tool-chain planning, parameter passing, and error recovery.

Hugging Face Blog · Apr 15, 2026

Tag: 失败分析 (1 articles)

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents