← Back to Home

Tag: 评估基准 (1 articles)

The Open Agent Leaderboard

Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.

Hugging Face Blog · May 18, 2026
BitByAI — AI-powered, AI-evolved AI News