Open Models have crossed a threshold
LangChain's evaluations show that open models like GLM-5 and MiniMax M2.7 now match closed frontier models on core agent tasks such as file operations and tool use, at a fraction of the cost and with lower latency.
Key Points
- Open models (GLM-5, MiniMax M2.7) now perform on par with closed frontier models on core agent tasks
- Massive cost advantage: MiniMax M2.7's output cost is ~1/20th of Claude Opus 4.6, translating to ~$87k annual savings
- Lower latency with open models (e.g., GLM-5 averages 0.65s vs Claude Opus 4.6's 2.56s), crucial for interactive products
- LangChain's Deep Agents evaluation framework assesses model agent capabilities across correctness, solve rate, step ratio, and tool call ratio
Analysis
You might assume that building the smartest AI requires the most expensive closed-source models. But LangChain's latest evaluation results tell us: the rules of the game have changed when it comes to building AI agents. The reason is simple: developers face two major real-world constraints when deploying agents—cost and latency. While powerful, closed frontier models are expensive (e.g., Claude Opus 4.6 charges $25 per million output tokens) and relatively slow. When your application outputs tens of millions of tokens daily, the cost difference can reach $87k annually. Users have low tolerance for latency in interactive products, and response times over 2 seconds are often unacceptable. LangChain tested several open models using their Deep Agents framework, specifically designed to evaluate agent capabilities. They focused not on how "smart" a model is, but on whether it can reliably perform the fundamental tasks essential for building agents: file operations, tool use, and following structured instructions. These are the "entry requirements" determining if a model is usable in an agent framework. The results are exciting: GLM-5 and MiniMax M2.7 achieved correctness scores (0.64 and 0.57) on core tasks that are close to closed-source models. More importantly, they excelled in efficiency—their step ratios and tool call ratios were near 1.0, meaning they complete tasks in the expected, economical way without "taking detours" or making unnecessary calls. As for cost? MiniMax M2.7's output cost is only $1.2 per million tokens, one-twentieth of Claude Opus 4.6. In terms of latency, GLM-5 on Baseten averages just 0.65 seconds, less than a third of Claude Opus 4.6. This reveals a deeper trend: open models are moving from "usable" to "good and economical." In the past, open models were often seen as a compromise for budget-constrained projects or as unreliable for specific tasks. But now, in the agent scenario with its high demands for reliability and efficiency, they have crossed the practicality threshold. This means that for the vast majority of production environments requiring agent deployment—whether customer service bots, data analysis assistants, or automated workflows—developers can fully prioritize open-source solutions, reserving closed-source models for the few complex tasks that truly require their top-tier reasoning capabilities. Trend One: Model selection strategy is shifting from "one model for everything" to "layered routing." Smart architectures will dynamically allocate based on task complexity: simple, high-frequency tasks use low-cost open models, while complex, critical tasks invoke closed models. This can reduce overall costs by an order of magnitude. Trend Two: Inference infrastructure becomes a key differentiator. Open models achieve low latency thanks to specialized inference providers like Groq, Fireworks, and Baseten. This means that in the future, part of a model's capability will be reflected in its inference ecosystem, not just its raw weights. Practical value for you: If you're developing AI agents, you should now add GLM-5 or MiniMax M2.7 to your technology evaluation list. Use LangChain's evaluation dimensions (correctness, solve rate, step ratio) to verify their performance on your specific tasks. You'll likely find that for 80% of routine operations, these open models are already sufficient, and the saved costs and improved response speed will give your product a significant advantage in user experience and commercial viability. Stop defaulting to the most expensive model—it's time to reevaluate your AI cost structure.
Analysis generated by BitByAI · Read original English article