Tag: 系统工程 (6 articles)

The Open Agent Leaderboard

Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.

Hugging Face Blog · May 18, 2026

Unlocking asynchronicity in continuous batching

Hugging Face reveals the bottleneck of alternating CPU/GPU waits in continuous batching, and shows how asynchronizing their workloads can yield a free 24% throughput boost.

Hugging Face Blog · May 14, 2026

vLLM V0 to V1: Correctness Before Corrections in RL

ServiceNow AI discovered that subtle differences in vLLM V1's inference engine could crash RL training, and restored stability by fixing four critical backend issues.

Hugging Face Blog · May 7, 2026

AI and the Future of Cybersecurity: Why Openness Matters

Hugging Face argues that the rise of AI-driven autonomous cybersecurity systems (like Mythos) reveals the critical structural advantage of open source in enabling distributed defense and mitigating risks from closed-source software.

Hugging Face Blog · Apr 21, 2026

An update on recent Claude Code quality reports

Anthropic clarifies that Claude Code quality issues were not model-related, but stemmed from three complex bugs in the engineering framework, revealing deep challenges in AI Agent system engineering.

Simon Willison ·

Better Harness: A Recipe for Harness Hill-Climbing with Evals

LangChain introduces the 'Better-Harness' system, treating evaluations as 'training data' for agents, iteratively optimizing the engineering framework (harness) to improve agent performance, with a core focus on avoiding overfitting and achieving generalization.

LangChain Blog ·