Tag: Developer Tools (117 articles)

One Click from Hugging Face to SageMaker Studio: The Last Mile Between Cloud and Open Models

Hugging Face and Amazon SageMaker AI now offer deep-link integration, allowing developers to jump directly to a pre-configured SageMaker Studio environment for model fine-tuning or deployment with a single click.

Hugging Face Blog · Jul 8, 2026

🤗 Kernels: Major Updates

HuggingFace introduces a new 'kernel' repository type on the Hub, improves security with reproducible builds and trusted publishers, and expands framework support, laying the foundation for a standardized custom GPU kernel ecosystem.

Hugging Face Blog · Jul 6, 2026

Better Models: Worse Tools

Newer Claude models are increasingly making mistakes when calling third-party edit tools, likely because Anthropic over-trained them on Claude Code's own tool syntax, degrading general tool-use ability and highlighting platform lock-in risks in AI training.

Simon Willison · Jul 5, 2026

Open Source AI Gap Map

Current AI's newly released Gap Map indexes 421 open source AI products, revealing structural gaps and opportunities for developers.

Simon Willison · Jul 4, 2026

Quoting Josh W. Comeau

Multiple developer course creators report revenue drops of over 50% as AI both shakes confidence in career prospects and offers free personalized learning alternatives, posing a serious challenge to traditional tech education.

Simon Willison · Jul 4, 2026

What's new in Claude Sonnet 5

Claude Sonnet 5 brings Opus-level performance at Sonnet prices, but a tokenizer change effectively raises costs by 30% for English users; removed sampling params and default thinking mode add more hidden costs.

Simon Willison · Jul 1, 2026

Ending AI Evaluation Anarchy: How Hugging Face and EEE Are Building a Trusted Record for Model Performance

EEE and Hugging Face Community Evals are now integrated, enabling standardized evaluation results with full metadata to be posted directly on model pages, solving the problem of scattered, incomparable scores and moving the industry toward evaluation transparency.

Hugging Face Blog · Jun 30, 2026

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Simon Willison reviews the open-source Ornith-1.0 model, highlighting its efficient tool calling and code understanding for agentic tasks, signaling new advances in open agentic coding models.

Simon Willison · Jun 30, 2026

Incident Report: CVE-2026-LGTM

A fictional incident report about dueling AI review agents reveals real risks of uncontrolled costs and multi-agent conflicts in AI-powered supply chain security.

Simon Willison · Jun 27, 2026

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

IBM's open-source CUGA liberates agent development from heavy orchestration frameworks, using built-in planning and reflection to enable smaller models to reliably handle complex, long-horizon tasks.

Hugging Face Blog · Jun 23, 2026

Beyond LoRA: Can you beat the most popular fine-tuning technique?

Hugging Face challenges LoRA's dominance in parameter-efficient fine-tuning, exploring whether there are better alternatives developers might be missing.

Hugging Face Blog · Jun 18, 2026

Is it agentic enough? Benchmarking open models on your own tooling

Hugging Face introduces agent-friendly tooling, showing via process-focused benchmarking that optimizing CLIs and docs can save AI agents 1.3x–6x in token costs.

Hugging Face Blog · Jun 18, 2026

olmo-eval: An evaluation workbench for the model development loop

Allen AI releases olmo-eval, shifting evaluation from final benchmarking to an iterative development loop with prompt-level analysis and flexible execution.

Hugging Face Blog · Jun 12, 2026

Claude Fable is relentlessly proactive

Without explicit instructions to use browser automation, Claude Fable 5 autonomously wrote HTML test pages, controlled browsers, and took screenshots to debug a UI bug.

Simon Willison · Jun 12, 2026

DiffusionGemma

Google open-sources DiffusionGemma, applying diffusion architecture to text generation for the first time, achieving over 500 tokens/sec and offering a new paradigm for high-throughput scenarios.

Simon Willison · Jun 11, 2026

If Claude Fable stops helping you, you'll never know

Anthropic's silent restrictions on Claude Fable's assistance for rival AI development tasks have sparked a fierce debate about AI transparency versus commercial interests.

Simon Willison · Jun 10, 2026

Quoting Andreas Kling

Ladybird browser now rejects public pull requests because AI-generated code blurs contributor responsibility, highlighting a trust crisis that open-source faces in the AI era.

Simon Willison · Jun 6, 2026

Designing the hf CLI as an agent-optimized way to work with the Hub

Hugging Face redesigned its CLI to automatically optimize output for both humans and AI agents, finding up to 6× token savings on complex tasks compared to raw API calls.

Hugging Face Blog · Jun 5, 2026

How we contain Claude across products

Anthropic detailed their sandboxing techniques for constraining Claude across products, revealing core security engineering practices for building trustworthy AI agents.

Simon Willison · May 31, 2026

Claude Opus 4.8: "a modest but tangible improvement"

Anthropic releases Claude Opus 4.8, focusing not on performance leaps but on significantly improving model 'honesty' — less hallucination, more willingness to admit uncertainty, which may be a more important direction than benchmark scores.

Simon Willison · May 29, 2026

Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor

Poolside's 33B-parameter agentic coding model, Laguna XS.2, achieves 2-3x inference speedup without quality loss through native vLLM integration, DFlash speculative decoding, and LLM Compressor quantization.

vLLM Blog · May 28, 2026

Native RL APIs in vLLM

vLLM introduces native Reinforcement Learning APIs to standardize weight synchronization and improve asynchronous training support, addressing key pain points of framework fragmentation and fragile deployments in online RL for large models.

vLLM Blog · May 28, 2026

Quoting Armin Ronacher

Open-source maintainer Armin Ronacher highlights that AI-generated 'slop' issue reports are becoming a new burden for open-source communities, appearing professional but riddled with inaccuracies, wasting maintainers' time.

Simon Willison · May 25, 2026

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA's new diffusion language models generate tokens in parallel and refine them iteratively, potentially breaking the latency limits of traditional autoregressive models and enabling self-correction.

Hugging Face Blog · May 23, 2026

Datasette Agent

Simon Willison combines his LLM library with Datasette to create a conversational AI assistant that lets users query and visualize databases using natural language.

Simon Willison · May 22, 2026

Google I/O, Gemini Spark, Antigravity

Google announced its personal AI Agent, Gemini Spark, and the underlying Antigravity tooling, but the shift to closed-source and vague security promises foreshadow a battle over AI agent control and trust.

Simon Willison · May 20, 2026

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Google released Gemini 3.5 Flash with a significant price hike, yet simultaneously deployed it across core products like Search and the Gemini app, revealing a shift from pure cost-effectiveness to paying for comprehensive model capabilities.

Simon Willison · May 20, 2026

OlmoEarth v1.1: A more efficient family of models

Allen AI releases OlmoEarth v1.1, reducing compute costs by up to 3x by optimizing token sequence length in transformer models for satellite imagery, while maintaining performance, making large-scale environmental monitoring AI more economically viable.

Hugging Face Blog · May 20, 2026

Introducing the Ettin Reranker Family

Hugging Face has released six Ettin reranker models of varying sizes, designed to significantly improve the accuracy of search and RAG systems at low cost through a 'retrieve-then-rerank' two-stage architecture.

Hugging Face Blog · May 19, 2026

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

PaddleOCR 3.5 adds a Transformers inference backend, enabling developers to seamlessly use its OCR and document parsing models within the Hugging Face ecosystem, lowering integration barriers for building applications like RAG.

Hugging Face Blog · May 18, 2026

The Open Agent Leaderboard

Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.

Hugging Face Blog · May 18, 2026

Not so locked in any more

AI coding agents are driving down the cost of code rewrites and migrations to near zero, fundamentally undermining the 'lock-in' effect of technology stacks and making technology choices more flexible and reversible.

Simon Willison · May 15, 2026

Quoting Mitchell Hashimoto

Mitchell Hashimoto observes that modern programming languages have become highly fungible, as demonstrated by Bun's rapid migration from Zig to Rust, signaling a shift from language lock-in to on-demand tool replacement.

Simon Willison · May 15, 2026

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

IBM releases two Apache 2.0 open-source multilingual embedding models, where the 97-million-parameter compact version outperforms all models of similar size on various benchmarks, demonstrating the huge potential of 'small but mighty' models for specific tasks.

Hugging Face Blog · May 15, 2026

llm 0.32a2

The LLM tool update supporting OpenAI's new /v1/responses endpoint reveals that AI model reasoning capabilities (especially between tool calls) are becoming core, and developers need to adapt to new interaction patterns.

Simon Willison · May 13, 2026

Thoughts on GitLab's workforce reduction and structural and strategic decisions

GitLab's radical restructuring reveals a deep trend: AI Agents are reducing software production costs, forcing companies to shift organizational structures from 'management-heavy' to 'small, autonomous delivery teams'.

Simon Willison · May 12, 2026

Quoting James Shore

James Shore warns that AI coding tools that only increase coding speed without reducing maintenance costs will lead to permanent technical debt inflation and "permanent indenture" for developers.

Simon Willison · May 12, 2026

Using LLM in the shebang line of a script

Simon Willison demonstrates integrating LLM tools into a script's shebang line, making natural language descriptions directly executable, signaling a major shift in programming interaction.

Simon Willison · May 12, 2026

Learning on the Shop floor

Shopify's CEO shares how their internal AI coding agent River, through a fully public collaboration model, transforms the entire company into a large-scale 'osmosis learning' workshop, revealing a novel paradigm for AI tool usage within organizations.

Simon Willison · May 11, 2026

Using Claude Code: The Unreasonable Effectiveness of HTML

A member of the Claude Code team argues that requesting output in HTML from AI is more effective than Markdown, leveraging its rich interactivity and visualization capabilities to significantly enhance clarity and user experience.

Simon Willison · May 9, 2026

CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

A specialized 4B cybersecurity model matches or outperforms an 8B generalist on key tasks, revealing the trend towards 'small, specialized, and local' AI deployment in security.

Hugging Face Blog · May 9, 2026

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

A complete case study proving that developers can efficiently fine-tune large models on AMD MI300X GPUs through the seamless integration of the Hugging Face ecosystem and ROCm, breaking the ecosystem monopoly of NVIDIA CUDA.

Hugging Face Blog · May 8, 2026

Behind the Scenes Hardening Firefox with Claude Mythos Preview

Mozilla leveraged the Claude Mythos preview and advanced harnessing techniques to find and fix 423 Firefox security vulnerabilities in one month—a 20x increase over their average—marking a qualitative shift in AI security auditing from noise generation to high-value signal production.

Simon Willison · May 8, 2026

Live blog: Code w/ Claude 2026

Anthropic showcased a comprehensive shift from a single model to a platform-centric, multi-agent collaboration paradigm at Code w/ Claude, focusing on enabling developers to build and run complex, long-duration agent tasks more efficiently.

Simon Willison · May 6, 2026

TRE Python binding — ReDoS robustness demo

Simon Willison demonstrates how the TRE regex library is immune to ReDoS attacks that cripple Python's built-in re module, exposing the fatal flaw of traditional backtracking engines.

Simon Willison · May 5, 2026

Codex CLI 0.128.0 adds /goal

OpenAI's Codex CLI introduces a /goal command that enables the coding agent to automatically loop until a goal is met or token budget exhausted, signaling a shift from single-shot Q&A to persistent task execution.

Simon Willison · May 1, 2026

We need RSS for sharing abundant vibe-coded apps

As AI lowers the barrier to app development, leading to a surge in personal, fragmented 'vibe-coded' apps, we need a new paradigm for app distribution and management, akin to RSS for blogs.

Simon Willison · May 1, 2026

LLM 0.32a0 is a major backwards-compatible refactor

Simon Willison's LLM library undergoes a major refactor, evolving from simple text prompts/responses to a structure supporting multi-turn message sequences and streaming mixed-type responses, adapting to modern LLMs' multimodal and tool-calling capabilities.

Simon Willison · Apr 30, 2026

DeepInfra on Hugging Face Inference Providers 🔥

Hugging Face integrates the cost-effective inference platform DeepInfra into its Inference Providers ecosystem, offering developers more model choices, flexible billing, and a unified API.

Hugging Face Blog · Apr 29, 2026

Quoting Matthew Yglesias

Matthew Yglesias's quote highlights two paths for AI-assisted programming: personal 'vibecoding' versus professional software companies using AI to build better products, with the latter being the more sustainable value creation model.

Simon Willison · Apr 28, 2026

What's new in pip 26.1 - lockfiles and dependency cooldowns!

pip 26.1 introduces native lockfiles (pylock.toml) and a dependency cooldown feature, aiming to enhance supply chain security and reproducibility in the Python ecosystem by locking dependency versions and avoiding overly new packages.

Simon Willison · Apr 28, 2026

microsoft/VibeVoice

Microsoft releases VibeVoice, an MIT-licensed Whisper-style speech model with built-in speaker diarization, capable of locally transcribing up to one hour of audio on a Mac.

Simon Willison · Apr 28, 2026

How to build scalable web apps with OpenAI's Privacy Filter

OpenAI has open-sourced a high-performance PII detection model, and when combined with the Gradio Server framework, developers can quickly build web applications that handle sensitive information, marking a shift where privacy protection is becoming a standard part of AI application development.

Hugging Face Blog · Apr 27, 2026

OpenAI's 'Unification' Ambition: GPT-5.5 Bids Farewell to Dedicated Code Models, Moving Towards General Agents

An OpenAI executive confirms GPT-5.5 will not have a dedicated code version, signaling that large models are moving from specialized capabilities to unified, general-purpose agent systems.

Simon Willison · Apr 25, 2026

GPT-5.5 prompting guide

OpenAI's official prompting guide for GPT-5.5 emphasizes it is not a drop-in replacement for GPT-5.2/5.4, requiring a fresh start in prompt engineering for optimal results.

Simon Willison · Apr 25, 2026

DeepSeek V4 - almost on the frontier, a fraction of the price

DeepSeek's V4 series delivers near-frontier performance at a fraction of the cost (Pro at $1.74/M input, Flash at just $0.14/M), potentially reshaping the cost-effectiveness standard for open-weight models.

Simon Willison · Apr 24, 2026

Extract PDF text in your browser with LiteParse for the web

Simon Willison adapted LlamaIndex's LiteParse into a pure browser-based version, enabling local PDF text extraction and OCR without a server, highlighting privacy and the importance of spatial text parsing.

Simon Willison · Apr 24, 2026

A pelican for GPT-5.5 via the semi-official Codex backdoor API

Although OpenAI's latest model GPT-5.5 hasn't officially launched its API, developers are already accessing it through a 'semi-official backdoor' in its Codex CLI using their ChatGPT subscription, revealing new dynamics in the battle over AI model distribution channels.

Simon Willison · Apr 24, 2026

How to Use Transformers.js in a Chrome Extension

Hugging Face shares a practical architecture for running AI models locally in Chrome extensions, revealing key design patterns for model deployment, messaging, and frontend-backend separation under Manifest V3.

Hugging Face Blog · Apr 23, 2026

Gemma 4 VLA Demo on Jetson Orin Nano Super

An end-to-end multimodal agent demo running on NVIDIA Jetson Orin Nano Super, showcasing how the model autonomously decides when to use the camera and answers questions with visual context, signaling the descent of powerful AI capabilities to edge devices.

Hugging Face Blog · Apr 22, 2026

Quoting Bobby Holley

Mozilla's CTO reports that using Anthropic's Claude AI, Firefox identified and fixed 271 vulnerabilities in an assessment, marking a shift where AI moves from an 'assistant' to a 'lead' role in security defense.

Simon Willison · Apr 22, 2026

Changes to GitHub Copilot Individual plans

GitHub Copilot tightens its individual plan due to the massive compute demands of AI agent workflows, halting sign-ups and restricting top models, signaling the unsustainability of per-request pricing in the agent era.

Simon Willison · Apr 22, 2026

AI Agents Are Too Human? A Counter-Intuitive Critique and Its Deeper Implications

An expert critiques current AI agents for being too 'human'—lacking rigor, patience, and focus, and tending to compromise when faced with difficulties, revealing fundamental flaws in their design.

Simon Willison · Apr 22, 2026

Claude Token Counter, now with model comparisons

Simon Willison's tool reveals that Claude Opus 4.7's new tokenizer inflates token counts by ~46% for text and up to 3x for images compared to its predecessor, leading to higher real-world costs despite unchanged official pricing.

Simon Willison · Apr 20, 2026

Claude system prompts as a git timeline

Simon Willison transformed Anthropic's published Claude system prompt history into a Git-based tool, enabling developers to trace prompt evolution like code changes, revealing a new paradigm for AI behavior debugging and understanding.

Simon Willison · Apr 18, 2026

Adding a new content type to my blog-to-newsletter tool

Simon Willison demonstrates an efficient prompt that enabled an AI coding assistant to complete a complex feature extension in one shot, revealing the core Agentic engineering pattern of 'explaining requirements with code'.

Simon Willison · Apr 18, 2026

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

Simon Willison's famous 'pelican riding a bicycle' benchmark surprisingly shows a locally-run, smaller Alibaba Qwen3.6 model outperforming the cloud-based, massive Claude Opus 4.7 in creative SVG generation, revealing the surprising potential of open-source models for specific tasks.

Simon Willison · Apr 17, 2026

When Developers Use AI to "Build" Tools: Insights from Simon Willison's Datasette News Previewer

Renowned developer Simon Willison shares how he used Claude AI to quickly build a YAML news preview tool for the Datasette project, demonstrating a new paradigm for AI-assisted development.

Simon Willison · Apr 16, 2026

Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents

This work extends reinforcement learning environments from logic puzzles to e-commerce conversations, using 8 algorithmically verifiable scenarios to train AI agents from 'chatting well' to 'getting things done'.

Hugging Face Blog · Apr 16, 2026

The PR you would have opened yourself

Hugging Face introduces a new tool to use AI to assist in porting models from the transformers library to MLX, revealing the core contradiction in open-source maintenance during the code agent era: the surge in contributions versus code quality and community communication costs.

Hugging Face Blog · Apr 16, 2026

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face releases a new tutorial demonstrating how fine-tuning multimodal embedding models can yield performance far surpassing general-purpose large models in specific domains (like visual document retrieval), even outperforming models with 4x its parameters.

Hugging Face Blog · Apr 16, 2026

Gemini 3.1 Flash TTS

Google's Gemini 3.1 Flash TTS is revolutionary because it uses detailed, screenplay-like prompts to precisely control emotion, accent, pace, and scene in speech synthesis, marking a shift from a 'tool' to a 'creative partner'.

Simon Willison · Apr 16, 2026

Trusted access for the next era of cyber defense

OpenAI launches GPT-5.4-Cyber, a model fine-tuned for defensive cybersecurity, and its "Trusted Access" program, signaling that leading AI companies are making cybersecurity a key battleground while seeking a new balance between safety and openness.

Simon Willison · Apr 15, 2026

The problem is that LLMs inherently lack the virtue of laziness

Bryan Cantrill argues that LLMs lack human laziness, which forces us to create elegant abstractions—and without this constraint, AI will make systems larger, not better.

Simon Willison · Apr 13, 2026

Deep Agents v0.5

LangChain introduces async subagents for its Deep Agents framework, enabling parallel task delegation and removing blocking bottlenecks in agent workflows.

LangChain Blog · Apr 8, 2026

Eight years of wanting, three months of building with AI

Through Lalit Maganti's experience, it reveals the potential and limitations of AI in software development, particularly the challenges in architectural design.

Simon Willison · Apr 6, 2026

Quoting Kyle Daigle

GitHub COO reveals 1B commits in 2025, GitHub Actions usage doubling annually, signaling exponential growth in developer activity.

Simon Willison · Apr 4, 2026

Gemma 4: Byte for byte, the most capable open models

Google DeepMind's Gemma 4 models innovate in parameter efficiency and support multi-modal inputs, marking a significant advancement in research on small effective models.

Simon Willison · Apr 3, 2026

Welcome Gemma 4: Frontier multimodal intelligence on device

Gemma 4 introduces enhanced multimodal capabilities, supporting image, text, and audio inputs, significantly improving model intelligence and deployment flexibility across devices.

Hugging Face Blog · Apr 2, 2026

Any Custom Frontend with Gradio's Backend

The introduction of Gradio.Server allows developers to use custom frontend frameworks while enjoying the robust backend support of Gradio, significantly enhancing application development flexibility and efficiency.

Hugging Face Blog · Apr 1, 2026

TRL v1.0: Post-Training Library Built to Move with the Field

The release of TRL v1.0 marks a significant shift in post-training libraries, designed to cope with the rapidly changing AI landscape while offering a stable yet experimental development environment.

Hugging Face Blog · Mar 31, 2026

Liberate your OpenClaw

With restrictions on Claude models in open agent platforms, Hugging Face offers two ways to help users quickly migrate and revive their OpenClaw agents, ensuring continued use of efficient open models.

Hugging Face Blog · Mar 27, 2026

Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Modular Diffusers offers composable building blocks for easily creating customized diffusion pipelines, greatly enhancing flexibility and reusability.

Hugging Face Blog · Mar 5, 2026

Building News Agents for Daily News Recaps with MCP, Q, and tmux

The author shares how to build a multi-agent system using MCP and Q tools to automate daily news recap generation, showcasing the practical potential of new workflows.

Eugene Yan · May 4, 2025

LLM Powered Autonomous Agents

LLM powered autonomous agents combine planning, memory, and tool usage, showcasing their potential in handling complex tasks and indicating a significant shift in work methodologies.

Lilian Weng · Jun 23, 2023

An update on recent Claude Code quality reports

Anthropic clarifies that Claude Code quality issues were not model-related, but stemmed from three complex bugs in the engineering framework, revealing deep challenges in AI Agent system engineering.

Simon Willison ·

Claude is a space to think

Anthropic declares Claude will remain permanently ad-free, arguing that advertising incentives are fundamentally incompatible with the core goal of an AI assistant being genuinely helpful.

Anthropic News ·

Announcing Retrieval Harness

LlamaIndex launches Retrieval Harness, equipping AI agents with filesystem primitives like file listing, exact grep, and chunked reading to overcome the fragmentation of semantic search.

LlamaIndex Blog ·

Announcing the LangChain + MongoDB Partnership: The AI Agent Stack That Runs On The Database You Already Trust

LangChain and MongoDB have deeply integrated to transform Atlas into a unified AI agent backend with vector search, persistent memory, natural language querying, and full-stack observability, aiming to solve data silos and infrastructure complexity in production.

LangChain Blog ·

Arcade.dev tools now in LangSmith Fleet

LangChain integrates Arcade's 7,500+ agent-optimized tools into LangSmith Fleet, solving authentication, authorization, and reliability challenges for agent tool use through a single gateway.

LangChain Blog ·

Better Harness: A Recipe for Harness Hill-Climbing with Evals

LangChain introduces the 'Better-Harness' system, treating evaluations as 'training data' for agents, iteratively optimizing the engineering framework (harness) to improve agent performance, with a core focus on avoiding overfitting and achieving generalization.

LangChain Blog ·

Building a Financial Document Pipeline with LlamaParse

LlamaParse's 'agentic parsing' capability automatically transforms messy financial PDFs (like pay stubs and brokerage statements) into structured data and enables cross-document analysis, significantly boosting automation in workflows like loan underwriting.

LlamaIndex Blog ·

Building a Financial Due Diligence Agent with LiteParse

LlamaIndex demonstrates a financial due diligence AI agent built with just 600 lines of code and no vector database, leveraging LiteParse to extract PDF layout information for precise, highlighted source citations in answers.

LlamaIndex Blog ·

ChatGPT voice mode is a weaker model

Simon Willison points out that ChatGPT's voice mode actually runs on an older GPT-4o model, revealing AI companies' business strategy of deploying different capability models across product lines.

Simon Willison ·

Claude Fable 5 and Claude Mythos 5

Anthropic launches its most capable models yet, but for the first time splits them into a 'safe' general release and an 'unrestricted' restricted one, signaling that safety control is becoming a core product feature as raw capability skyrockets.

Anthropic News ·

Continual learning for AI agents

Continual learning for AI agents is not just about updating model weights; crucial evolution happens at the 'harness' and 'context' layers, offering new ways to build truly personalized and growing agents.

LangChain Blog ·

CSP Allow-list Experiment

Simon Willison demonstrates an AI-built CSP sandbox experiment that manages security policies through dynamic interception and user authorization, revealing how AI-assisted development is changing the implementation of complex frontend security.

Simon Willison ·

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

The article clarifies the confusion around key AI Agent terms like Harness and Scaffolding, aiming to build a clear, shared mental model for the field.

Hugging Face Blog ·

Have your agent record video demos of its work with shot-scraper video

Simon Willison introduces shot-scraper video, a command that lets AI agents record web application demos via YAML scripts, signaling a shift in AI development toolchains from 'generating code' to 'generating verifiable deliverables.'

Simon Willison ·

How My Agents Self-Heal in Production

A LangChain engineer shares how they built a self-healing system where AI agents automatically detect deployment errors, analyze root causes, and submit code fixes, combining statistical methods with AI judgment to close the loop.

LangChain Blog ·

How we build evals for Deep Agents

The LangChain team shares their core philosophy for building AI agent evals: more tests don't mean better agents; the key is designing targeted, self-documenting evaluations that directly measure desired behaviors.

LangChain Blog ·

Human judgment in the agent improvement loop

LangChain explains the core challenge of building reliable AI Agents: integrating human experts' tacit knowledge and judgment into the development loop, not just relying on documented explicit knowledge.

LangChain Blog ·

Introducing Claude Opus 4.7

Anthropic releases Claude Opus 4.7, focusing on enhanced complex coding and long-running task capabilities, with its 'self-verification' mechanism marking a key step towards more autonomous AI agents.

Anthropic News ·

Introducing Claude Opus 4.8

Anthropic releases Claude Opus 4.8, with core breakthroughs in significantly improving the reliability, judgment, and long-running consistency of Agent tasks, marking AI's practical shift from 'usable' to 'trustworthy'.

Anthropic News ·

Is grep all you need? Lexical VS Sematic Search for Agents

The article explores the boundaries between traditional grep and semantic search/RAG for AI agents, highlighting grep's limitations with unstructured documents and at enterprise scale, and proposes a hybrid approach combining parsing tools.

LlamaIndex Blog ·

LlamaIndex Newsletter 2026-04-14

LlamaIndex launches ParseBench, the first OCR benchmark for AI agents, and demonstrates breakthroughs in structured document understanding and multimodal reasoning, signaling a shift from text extraction to deep semantic comprehension.

LlamaIndex Blog ·

LlamaIndex Newsletter 2026-04-21

LlamaIndex launches ParseBench, the first document OCR benchmark for AI agents, alongside new parsing tools and benchmark results, marking a shift towards quantifiable document intelligence.

LlamaIndex Blog ·

LlamaIndex Newsletter 5-19-26

LlamaIndex introduces ParseBench, the first OCR benchmark designed specifically for AI agents, alongside open-sourcing a local document parsing server and a secure sandboxed CLI agent, signaling a shift in document processing towards agent-native infrastructure.

LlamaIndex Blog ·

March 2026: LangChain Newsletter

LangChain is pushing agents from experimental prototypes to scalable, manageable enterprise assets through updates like LangSmith Fleet, Skills, and Sandboxes.

LangChain Blog ·

Anthropic acquires Stainless

Anthropic acquires core SDK tool provider Stainless to solve the 'last mile' problem of AI agent connectivity and strengthen its MCP protocol ecosystem.

Anthropic News ·

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Meta released Muse Spark, but the real story is its chat interface integrating 16 tools—web search, social media content search, code interpreter, etc.—building a complete AI agent workbench.

Simon Willison ·

OCR Accuracy Explained: What Impacts Performance and How to Improve It

OCR accuracy is not a single number, but a systems engineering problem determined by image quality, document complexity, evaluation metrics, and post-processing.

LlamaIndex Blog ·

Introducing Claude Sonnet 5

Anthropic's Sonnet 5 delivers agentic performance close to the Opus flagship at significantly lower cost, enabling developers to build powerful autonomous agents with mid-tier models.

Anthropic News ·

Quoting Paul Graham

Paul Graham observes that AI-written emails from founders, with their unnatural journalistic style and lack of authenticity, damage trust and highlight a core challenge of human communication in the AI era.

Simon Willison ·

Speculators v0.5.0: DFlash Support and Online Training

The Speculators v0.5.0 release introduces the DFlash algorithm for speculative decoding, which generates draft tokens in a single forward pass, significantly reducing inference latency, and unifies online and offline training workflows.

vLLM Blog ·

vLLM Tops the Artificial Analysis Leaderboard

The open-source inference engine vLLM has outperformed all proprietary competitors in deploying multiple frontier open-weight models, with its core optimization techniques like operator fusion publicly available, revealing the immense potential of open source in AI inference.

vLLM Blog ·

Your harness, your memory

The article argues that agent harnesses are inextricably tied to memory; using a closed or API-based harness means ceding control of your agent's memory to a third party, creating deep lock-in. Memory should be open.

LangChain Blog ·