Changes in the system prompt between Claude Opus 4.6 and 4.7
The system prompt update for Claude Opus 4.7 reveals the evolution of AI assistants from passive responders to proactive tool-users, deep task executors, and more responsible safety frameworks.
Simon Willison · Apr 19, 2026
Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents
IBM and HuggingFace introduce the VAKRA benchmark, revealing that current AI agents perform poorly on complex multi-step tasks, with key failure modes including tool-chain planning, parameter passing, and error recovery.
Hugging Face Blog · Apr 15, 2026
Meta's new model is Muse Spark, and meta.ai chat has some interesting tools
Simon Willison discovered 16 hidden tools behind meta.ai, including browser search, cross-platform content search, and Python execution, revealing a trend of AI chat interfaces evolving into tool collections.
Simon Willison · Apr 9, 2026
LLM Powered Autonomous Agents
LLM powered autonomous agents combine planning, memory, and tool usage, showcasing their potential in handling complex tasks and indicating a significant shift in work methodologies.
Lilian Weng · Jun 23, 2023