评估基准 — Tag

The Open Agent Leaderboard

Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.

Hugging Face Blog · May 18, 2026

Tag: 评估基准 (1 articles)

The Open Agent Leaderboard