Tag: 工具调用 (4 articles)

Better Models: Worse Tools

Newer Claude models are increasingly making mistakes when calling third-party edit tools, likely because Anthropic over-trained them on Claude Code's own tool syntax, degrading general tool-use ability and highlighting platform lock-in risks in AI training.

Simon Willison · Jul 5, 2026

Changes in the system prompt between Claude Opus 4.6 and 4.7

The system prompt update for Claude Opus 4.7 reveals the evolution of AI assistants from passive responders to proactive tool-users, deep task executors, and more responsible safety frameworks.

Simon Willison · Apr 19, 2026

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

IBM and HuggingFace introduce the VAKRA benchmark, revealing that current AI agents perform poorly on complex multi-step tasks, with key failure modes including tool-chain planning, parameter passing, and error recovery.

Hugging Face Blog · Apr 15, 2026

LLM Powered Autonomous Agents

LLM powered autonomous agents combine planning, memory, and tool usage, showcasing their potential in handling complex tasks and indicating a significant shift in work methodologies.

Lilian Weng · Jun 23, 2023