多步推理 — Tag

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

IBM and HuggingFace introduce the VAKRA benchmark, revealing that current AI agents perform poorly on complex multi-step tasks, with key failure modes including tool-chain planning, parameter passing, and error recovery.

Hugging Face Blog · Apr 15, 2026

Tag: 多步推理 (1 articles)

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents