Tag: 评估体系 (3 articles)

Previewing Interrupt 2026: Agents at Enterprise Scale

LangChain's annual conference focuses on the challenges of scaling AI agents from production validation to enterprise-wide deployment, revealing how major companies build platforms, evaluate performance, and structure teams.

LangChain Blog · Apr 10, 2026

Better Harness: A Recipe for Harness Hill-Climbing with Evals

LangChain argues that building better AI agents hinges on improving their 'harness' rather than the model itself, and shares a systematic method using evals as training signals for iterative improvement.

LangChain Blog · Apr 9, 2026

How we build evals for Deep Agents

LangChain shares its core philosophy for building AI agent evaluation systems: more evals aren't better; instead, precisely define and measure the agent behaviors you care about to guide its evolution.

LangChain Blog · Mar 26, 2026