Human judgment in the agent improvement loop

LangChain explains the core challenge of building reliable AI Agents: integrating human experts' tacit knowledge and judgment into the development loop, not just relying on documented explicit knowledge.

AI Agent Large Language Models 知识工程 Developer Tools 企业应用

KEY POINTS

The core challenge of reliable Agents is absorbing human experts' 'tacit knowledge', not just relying on documentation
Human judgment should be integrated into three core Agent components: workflow design, tool design, and agent context
Using a 'trader copilot' example, it shows how to involve domain experts (e.g., risk & compliance) in designing automated checks
The industry trend is moving from simple system prompts to providing Agents with richer, structured initial context

ANALYSIS

The Cause: Why Human Judgment is Now Non-Negotiable When enterprises attempt to automate complex workflows with AI Agents, a fundamental issue emerges: the critical knowledge that makes teams excel often isn't the 'explicit knowledge' documented in manuals, but the 'tacit knowledge' residing in employees' minds. This includes a trader's intuitive grasp of vague terms like 'today's exposure,' or a data scientist's experience with which database tables are reliable or which query patterns are inefficient. LangChain's article points out that you often don't realize the importance of this tacit knowledge until you try building an Agent to automate the task. Therefore, the key to building reliable Agents lies in designing an improvement loop that continuously incorporates human expert judgment.

Breakdown: How Human Judgment Integrates into the Three Core Agent Components Using a 'trader copilot' Agent as an example, the article clearly breaks down the three levels where human experts should intervene:

Workflow Design: While LLMs can autonomously plan tool call sequences, in high-risk or heavily regulated scenarios (like finance), parts of the workflow must be rigidly controlled with deterministic code. For instance, having risk and compliance experts define mandatory automated checks ensures the Agent's final answer adheres to company standards. This balances flexibility with security.
Tool Design: Developers implement tools for the Agent (e.g., executing SQL queries), but the tools' names, parameters, and descriptions require careful crafting. A key trade-off is between providing flexible, general tools (like execute_sql), which are riskier, versus parameterized, specialized tools, which are safer but less capable. Determining the optimal approach requires running evaluations, and the evaluation criteria themselves need input from business experts.
Agent Context: Early Agents only had simple system prompts. The current trend is to provide Agents with much richer initial context. For example, the growing adoption of standards like Anthropic's Skills. This means human expert knowledge needs to be 'pre-loaded' in a structured, efficiently consumable format for the Agent, not just as a paragraph of natural language.

Trend Insight: From 'Prompt Engineering' to 'Knowledge Engineering' This article reveals a deeper trend: the focus of AI Agent construction is shifting from mere 'prompt engineering' towards more systematic 'knowledge engineering.' Simply writing elegant prompts is insufficient. You need a systematic method to encode domain expert judgment into the Agent's workflow, tool design, and context structure. This is analogous to 'Domain-Driven Design' in traditional software engineering, but now serving AI systems. The value of observability tools like LangSmith lies in making this iterative process of 'absorbing human judgment' visible, measurable, and optimizable.

Practical Value and a Counter-Intuitive Insight For teams building Agents, the most direct takeaway is: Involve your domain experts (not just engineers) in design reviews early. Have them review tool descriptions, define compliance checkpoints, and verify the accuracy of contextual information. A counter-intuitive point is that adding deterministic code control (which seemingly reduces the Agent's 'intelligence') can actually enhance the overall system's reliability and trustworthiness, especially in critical business operations. Ultimately, a great Agent doesn't replace human experts; it productizes and scales their wisdom.

Analysis by BitByAI · Read original

Originally from LangChain Blog · Analyzed by BitByAI