The Open Agent Leaderboard
Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.
Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.
Hugging Face reveals the bottleneck of alternating CPU/GPU waits in continuous batching, and shows how asynchronizing their workloads can yield a free 24% throughput boost.
ServiceNow AI discovered that subtle differences in vLLM V1's inference engine could crash RL training, and restored stability by fixing four critical backend issues.
Hugging Face argues that the rise of AI-driven autonomous cybersecurity systems (like Mythos) reveals the critical structural advantage of open source in enabling distributed defense and mitigating risks from closed-source software.
LangChain argues that building better AI agents hinges on improving their 'harness' rather than the model itself, and shares a systematic method using evals as training signals for iterative improvement.