Human judgment in the agent improvement loop
LangChain argues that building reliable AI agents requires systematically integrating domain experts' tacit knowledge and judgment throughout the development lifecycle, rather than relying solely on the model's own capabilities.
Key Points
- Reliable agents need to absorb experts' tacit knowledge (e.g., trading conventions, database experience)
- Human judgment should be integrated across the entire agent development lifecycle: workflow design, tool design, and context building
- Balance deterministic code and LLM autonomy based on business risk, e.g., forcing compliance checks
- Tool design requires trade-offs between flexibility and control, validated through evaluations that satisfy all stakeholders
- The industry trend is moving from single system prompts to providing agents with richer, structured domain context
Analysis
Have you ever built an impressive AI agent demo that falls apart in real-world use? A recent LangChain blog post hits the core issue: we've been too focused on making the model smarter, while overlooking the 'tacit knowledge' that actually makes business processes work. The article uses a financial firm's trader assistant as a universal example. Automating market data requests with an agent seems straightforward—the challenge isn't generating SQL, but teaching the agent what 'today's exposure' means internally, or which database tables are authoritative. This knowledge lives in senior employees' heads, not in manuals. The author calls this 'tacit knowledge' and argues that systematically injecting this human judgment is the real key to building reliable agents. The post offers a practical three-step framework. First, workflow design: while LLMs excel at self-planning, using deterministic code for critical steps (like compliance checks) can reduce latency, save tokens, and enforce guarantees—like hard-coded traffic rules for a self-driving car. Second, tool design: offering a flexible 'execute_sql' tool is powerful but risky; parameterized tools are safer but limited. This choice requires evaluation and buy-in from all stakeholders—tech, business, and risk teams. Third, context building: the industry is moving away from cramming everything into one system prompt. Structured approaches, like Anthropic's Skills standard, deliver better results by providing curated documentation, examples, and domain rules. This points to a deeper trend: competition in AI agents is shifting from 'whose model is smarter' to 'who can better engineer domain knowledge.' For developers, our role is evolving—we're becoming translators and architects bridging business expertise and AI capabilities. Next time you design an agent, the first question shouldn't be 'which model,' but 'which business expert should I invite to sit beside me.'
Analysis generated by BitByAI · Read original English article