Claude Opus 4.8: "a modest but tangible improvement"
Anthropic releases Claude Opus 4.8, focusing not on performance leaps but on significantly improving model 'honesty' — less hallucination, more willingness to admit uncertainty, which may be a more important direction than benchmark scores.
Simon Willison · May 29, 2026
Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor
Poolside's 33B-parameter agentic coding model, Laguna XS.2, achieves 2-3x inference speedup without quality loss through native vLLM integration, DFlash speculative decoding, and LLM Compressor quantization.
vLLM Blog · May 28, 2026
Native RL APIs in vLLM
vLLM introduces native Reinforcement Learning APIs to standardize weight synchronization and improve asynchronous training support, addressing key pain points of framework fragmentation and fragile deployments in online RL for large models.
vLLM Blog · May 28, 2026
Harness, Scaffold, and the AI Agent Terms Worth Getting Right
Hugging Face publishes an AI Agent glossary to clarify confusing and rapidly evolving terminology, providing developers with a clear mental model.
Hugging Face Blog · May 25, 2026
Quoting Armin Ronacher
Open-source maintainer Armin Ronacher highlights that AI-generated 'slop' issue reports are becoming a new burden for open-source communities, appearing professional but riddled with inaccuracies, wasting maintainers' time.
Simon Willison · May 25, 2026
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
NVIDIA's new diffusion language models generate tokens in parallel and refine them iteratively, potentially breaking the latency limits of traditional autoregressive models and enabling self-correction.
Hugging Face Blog · May 23, 2026
Datasette Agent
Simon Willison combines his LLM library with Datasette to create a conversational AI assistant that lets users query and visualize databases using natural language.
Simon Willison · May 22, 2026
Google I/O, Gemini Spark, Antigravity
Google announced its personal AI Agent, Gemini Spark, and the underlying Antigravity tooling, but the shift to closed-source and vague security promises foreshadow a battle over AI agent control and trust.
Simon Willison · May 20, 2026
Gemini 3.5 Flash: more expensive, but Google plan to use it for everything
Google released Gemini 3.5 Flash with a significant price hike, yet simultaneously deployed it across core products like Search and the Gemini app, revealing a shift from pure cost-effectiveness to paying for comprehensive model capabilities.
Simon Willison · May 20, 2026
OlmoEarth v1.1: A more efficient family of models
Allen AI releases OlmoEarth v1.1, reducing compute costs by up to 3x by optimizing token sequence length in transformer models for satellite imagery, while maintaining performance, making large-scale environmental monitoring AI more economically viable.
Hugging Face Blog · May 20, 2026
Introducing the Ettin Reranker Family
Hugging Face has released six Ettin reranker models of varying sizes, designed to significantly improve the accuracy of search and RAG systems at low cost through a 'retrieve-then-rerank' two-stage architecture.
Hugging Face Blog · May 19, 2026
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
PaddleOCR 3.5 adds a Transformers inference backend, enabling developers to seamlessly use its OCR and document parsing models within the Hugging Face ecosystem, lowering integration barriers for building applications like RAG.
Hugging Face Blog · May 18, 2026
The Open Agent Leaderboard
Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.
Hugging Face Blog · May 18, 2026
Not so locked in any more
AI coding agents are driving down the cost of code rewrites and migrations to near zero, fundamentally undermining the 'lock-in' effect of technology stacks and making technology choices more flexible and reversible.
Simon Willison · May 15, 2026
Quoting Mitchell Hashimoto
Mitchell Hashimoto observes that modern programming languages have become highly fungible, as demonstrated by Bun's rapid migration from Zig to Rust, signaling a shift from language lock-in to on-demand tool replacement.
Simon Willison · May 15, 2026
Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality
IBM releases two Apache 2.0 open-source multilingual embedding models, where the 97-million-parameter compact version outperforms all models of similar size on various benchmarks, demonstrating the huge potential of 'small but mighty' models for specific tasks.
Hugging Face Blog · May 15, 2026
Announcing VeRL-Omni: Easy, Fast, and Stable RL Training for Diffusion and Omni-Modality Models
VeRL-Omni is a reinforcement learning training framework designed for multimodal generative models, addressing the engineering challenges of efficient and stable RL training on diffusion and omni-modality models, extending the LLM RL training paradigm to image, video, and audio generation.
vLLM Blog · May 14, 2026
llm 0.32a2
The LLM tool update supporting OpenAI's new /v1/responses endpoint reveals that AI model reasoning capabilities (especially between tool calls) are becoming core, and developers need to adapt to new interaction patterns.
Simon Willison · May 13, 2026
Thoughts on GitLab's workforce reduction and structural and strategic decisions
GitLab's radical restructuring reveals a deep trend: AI Agents are reducing software production costs, forcing companies to shift organizational structures from 'management-heavy' to 'small, autonomous delivery teams'.
Simon Willison · May 12, 2026
Quoting James Shore
James Shore warns that AI coding tools that only increase coding speed without reducing maintenance costs will lead to permanent technical debt inflation and "permanent indenture" for developers.
Simon Willison · May 12, 2026
Using LLM in the shebang line of a script
Simon Willison demonstrates integrating LLM tools into a script's shebang line, making natural language descriptions directly executable, signaling a major shift in programming interaction.
Simon Willison · May 12, 2026
Learning on the Shop floor
Shopify's CEO shares how their internal AI coding agent River, through a fully public collaboration model, transforms the entire company into a large-scale 'osmosis learning' workshop, revealing a novel paradigm for AI tool usage within organizations.
Simon Willison · May 11, 2026
Using Claude Code: The Unreasonable Effectiveness of HTML
A member of the Claude Code team argues that requesting output in HTML from AI is more effective than Markdown, leveraging its rich interactivity and visualization capabilities to significantly enhance clarity and user experience.
Simon Willison · May 9, 2026
CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models
A specialized 4B cybersecurity model matches or outperforms an 8B generalist on key tasks, revealing the trend towards 'small, specialized, and local' AI deployment in security.
Hugging Face Blog · May 9, 2026
MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required
A complete case study proving that developers can efficiently fine-tune large models on AMD MI300X GPUs through the seamless integration of the Hugging Face ecosystem and ROCm, breaking the ecosystem monopoly of NVIDIA CUDA.
Hugging Face Blog · May 8, 2026
Behind the Scenes Hardening Firefox with Claude Mythos Preview
Mozilla leveraged the Claude Mythos preview and advanced harnessing techniques to find and fix 423 Firefox security vulnerabilities in one month—a 20x increase over their average—marking a qualitative shift in AI security auditing from noise generation to high-value signal production.
Simon Willison · May 8, 2026
Live blog: Code w/ Claude 2026
Anthropic showcased a comprehensive shift from a single model to a platform-centric, multi-agent collaboration paradigm at Code w/ Claude, focusing on enabling developers to build and run complex, long-duration agent tasks more efficiently.
Simon Willison · May 6, 2026
Vibe coding and agentic engineering are getting closer than I'd like
Veteran developer Simon Willison finds that as AI coding agents become more reliable, his habit of reviewing every line of code is eroding, blurring the line between 'vibe coding' and professional 'agentic engineering' and raising deep concerns about responsibility for production code.
Simon Willison · May 6, 2026
TRE Python binding — ReDoS robustness demo
Simon Willison demonstrates how the TRE regex library is immune to ReDoS attacks that cripple Python's built-in re module, exposing the fatal flaw of traditional backtracking engines.
Simon Willison · May 5, 2026
Codex CLI 0.128.0 adds /goal
OpenAI's Codex CLI introduces a /goal command that enables the coding agent to automatically loop until a goal is met or token budget exhausted, signaling a shift from single-shot Q&A to persistent task execution.
Simon Willison · May 1, 2026
We need RSS for sharing abundant vibe-coded apps
As AI lowers the barrier to app development, leading to a surge in personal, fragmented 'vibe-coded' apps, we need a new paradigm for app distribution and management, akin to RSS for blogs.
Simon Willison · May 1, 2026
LLM 0.32a0 is a major backwards-compatible refactor
Simon Willison's LLM library undergoes a major refactor, evolving from simple text prompts/responses to a structure supporting multi-turn message sequences and streaming mixed-type responses, adapting to modern LLMs' multimodal and tool-calling capabilities.
Simon Willison · Apr 30, 2026
DeepInfra on Hugging Face Inference Providers 🔥
Hugging Face integrates the cost-effective inference platform DeepInfra into its Inference Providers ecosystem, offering developers more model choices, flexible billing, and a unified API.
Hugging Face Blog · Apr 29, 2026
Quoting Matthew Yglesias
Matthew Yglesias's quote highlights two paths for AI-assisted programming: personal 'vibecoding' versus professional software companies using AI to build better products, with the latter being the more sustainable value creation model.
Simon Willison · Apr 28, 2026
What's new in pip 26.1 - lockfiles and dependency cooldowns!
pip 26.1 introduces native lockfiles (pylock.toml) and a dependency cooldown feature, aiming to enhance supply chain security and reproducibility in the Python ecosystem by locking dependency versions and avoiding overly new packages.
Simon Willison · Apr 28, 2026
microsoft/VibeVoice
Microsoft releases VibeVoice, an MIT-licensed Whisper-style speech model with built-in speaker diarization, capable of locally transcribing up to one hour of audio on a Mac.
Simon Willison · Apr 28, 2026
How to build scalable web apps with OpenAI's Privacy Filter
OpenAI has open-sourced a high-performance PII detection model, and when combined with the Gradio Server framework, developers can quickly build web applications that handle sensitive information, marking a shift where privacy protection is becoming a standard part of AI application development.
Hugging Face Blog · Apr 27, 2026
OpenAI's 'Unification' Ambition: GPT-5.5 Bids Farewell to Dedicated Code Models, Moving Towards General Agents
An OpenAI executive confirms GPT-5.5 will not have a dedicated code version, signaling that large models are moving from specialized capabilities to unified, general-purpose agent systems.
Simon Willison · Apr 25, 2026
GPT-5.5 prompting guide
OpenAI's official prompting guide for GPT-5.5 emphasizes it is not a drop-in replacement for GPT-5.2/5.4, requiring a fresh start in prompt engineering for optimal results.
Simon Willison · Apr 25, 2026
DeepSeek V4 - almost on the frontier, a fraction of the price
DeepSeek's V4 series delivers near-frontier performance at a fraction of the cost (Pro at $1.74/M input, Flash at just $0.14/M), potentially reshaping the cost-effectiveness standard for open-weight models.
Simon Willison · Apr 24, 2026
DeepSeek V4 in vLLM: Efficient Long-context Attention
vLLM announces support for DeepSeek V4 models, featuring a novel attention mechanism that tackles the core challenges of memory and computational cost in million-token long-context inference.
vLLM Blog · Apr 24, 2026
Extract PDF text in your browser with LiteParse for the web
Simon Willison adapted LlamaIndex's LiteParse into a pure browser-based version, enabling local PDF text extraction and OCR without a server, highlighting privacy and the importance of spatial text parsing.
Simon Willison · Apr 24, 2026
A pelican for GPT-5.5 via the semi-official Codex backdoor API
Although OpenAI's latest model GPT-5.5 hasn't officially launched its API, developers are already accessing it through a 'semi-official backdoor' in its Codex CLI using their ChatGPT subscription, revealing new dynamics in the battle over AI model distribution channels.
Simon Willison · Apr 24, 2026
How to Use Transformers.js in a Chrome Extension
Hugging Face shares a practical architecture for running AI models locally in Chrome extensions, revealing key design patterns for model deployment, messaging, and frontend-backend separation under Manifest V3.
Hugging Face Blog · Apr 23, 2026
Gemma 4 VLA Demo on Jetson Orin Nano Super
An end-to-end multimodal agent demo running on NVIDIA Jetson Orin Nano Super, showcasing how the model autonomously decides when to use the camera and answers questions with visual context, signaling the descent of powerful AI capabilities to edge devices.
Hugging Face Blog · Apr 22, 2026
Quoting Bobby Holley
Mozilla's CTO reports that using Anthropic's Claude AI, Firefox identified and fixed 271 vulnerabilities in an assessment, marking a shift where AI moves from an 'assistant' to a 'lead' role in security defense.
Simon Willison · Apr 22, 2026
Changes to GitHub Copilot Individual plans
GitHub Copilot tightens its individual plan due to the massive compute demands of AI agent workflows, halting sign-ups and restricting top models, signaling the unsustainability of per-request pricing in the agent era.
Simon Willison · Apr 22, 2026
AI Agents Are Too Human? A Counter-Intuitive Critique and Its Deeper Implications
An expert critiques current AI agents for being too 'human'—lacking rigor, patience, and focus, and tending to compromise when faced with difficulties, revealing fundamental flaws in their design.
Simon Willison · Apr 22, 2026
Claude Token Counter, now with model comparisons
Simon Willison's tool reveals that Claude Opus 4.7's new tokenizer inflates token counts by ~46% for text and up to 3x for images compared to its predecessor, leading to higher real-world costs despite unchanged official pricing.
Simon Willison · Apr 20, 2026
Claude system prompts as a git timeline
Simon Willison transformed Anthropic's published Claude system prompt history into a Git-based tool, enabling developers to trace prompt evolution like code changes, revealing a new paradigm for AI behavior debugging and understanding.
Simon Willison · Apr 18, 2026
Adding a new content type to my blog-to-newsletter tool
Simon Willison demonstrates an efficient prompt that enabled an AI coding assistant to complete a complex feature extension in one shot, revealing the core Agentic engineering pattern of 'explaining requirements with code'.
Simon Willison · Apr 18, 2026
Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7
Simon Willison's famous 'pelican riding a bicycle' benchmark surprisingly shows a locally-run, smaller Alibaba Qwen3.6 model outperforming the cloud-based, massive Claude Opus 4.7 in creative SVG generation, revealing the surprising potential of open-source models for specific tasks.
Simon Willison · Apr 17, 2026
When Developers Use AI to "Build" Tools: Insights from Simon Willison's Datasette News Previewer
Renowned developer Simon Willison shares how he used Claude AI to quickly build a YAML news preview tool for the Datasette project, demonstrating a new paradigm for AI-assisted development.
Simon Willison · Apr 16, 2026
Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents
This work extends reinforcement learning environments from logic puzzles to e-commerce conversations, using 8 algorithmically verifiable scenarios to train AI agents from 'chatting well' to 'getting things done'.
Hugging Face Blog · Apr 16, 2026
The PR you would have opened yourself
Hugging Face introduces a new tool to use AI to assist in porting models from the transformers library to MLX, revealing the core contradiction in open-source maintenance during the code agent era: the surge in contributions versus code quality and community communication costs.
Hugging Face Blog · Apr 16, 2026
Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers
Hugging Face releases a new tutorial demonstrating how fine-tuning multimodal embedding models can yield performance far surpassing general-purpose large models in specific domains (like visual document retrieval), even outperforming models with 4x its parameters.
Hugging Face Blog · Apr 16, 2026
Gemini 3.1 Flash TTS
Google's Gemini 3.1 Flash TTS is revolutionary because it uses detailed, screenplay-like prompts to precisely control emotion, accent, pace, and scene in speech synthesis, marking a shift from a 'tool' to a 'creative partner'.
Simon Willison · Apr 16, 2026
Trusted access for the next era of cyber defense
OpenAI launches GPT-5.4-Cyber, a model fine-tuned for defensive cybersecurity, and its "Trusted Access" program, signaling that leading AI companies are making cybersecurity a key battleground while seeking a new balance between safety and openness.
Simon Willison · Apr 15, 2026
The problem is that LLMs inherently lack the virtue of laziness
Bryan Cantrill argues that LLMs lack human laziness, which forces us to create elegant abstractions—and without this constraint, AI will make systems larger, not better.
Simon Willison · Apr 13, 2026
Your harness, your memory
LangChain CEO argues that agent harnesses are inextricably tied to memory, and using a closed harness means ceding control of your memory to a third party, creating significant lock-in.
LangChain Blog · Apr 11, 2026
Meta's new model is Muse Spark, and meta.ai chat has some interesting tools
Simon Willison discovered 16 hidden tools behind meta.ai, including browser search, cross-platform content search, and Python execution, revealing a trend of AI chat interfaces evolving into tool collections.
Simon Willison · Apr 9, 2026
Better Harness: A Recipe for Harness Hill-Climbing with Evals
LangChain argues that building better AI agents hinges on improving their 'harness' rather than the model itself, and shares a systematic method using evals as training signals for iterative improvement.
LangChain Blog · Apr 9, 2026
Deep Agents v0.5
LangChain introduces async subagents for its Deep Agents framework, enabling parallel task delegation and removing blocking bottlenecks in agent workflows.
LangChain Blog · Apr 8, 2026
Eight years of wanting, three months of building with AI
Through Lalit Maganti's experience, it reveals the potential and limitations of AI in software development, particularly the challenges in architectural design.
Simon Willison · Apr 6, 2026
Continual learning for AI agents
Continual learning for AI agents occurs at three layers: model, harness, and context, with context-layer evolution being the most practical and actionable.
LangChain Blog · Apr 6, 2026
How My Agents Self-Heal in Production
A LangChain engineer shares a complete pipeline for AI agents to automatically detect regressions, diagnose issues, and submit fix PRs after deployment, combining statistical methods with intelligent triage to reduce false positives.
LangChain Blog · Apr 4, 2026
Quoting Kyle Daigle
GitHub COO reveals 1B commits in 2025, GitHub Actions usage doubling annually, signaling exponential growth in developer activity.
Simon Willison · Apr 4, 2026
Gemma 4: Byte for byte, the most capable open models
Google DeepMind's Gemma 4 models innovate in parameter efficiency and support multi-modal inputs, marking a significant advancement in research on small effective models.
Simon Willison · Apr 3, 2026
Open Models have crossed a threshold
LangChain's evaluations show that open models like GLM-5 and MiniMax M2.7 now match closed frontier models on core agent tasks such as file operations and tool use, at a fraction of the cost and with lower latency.
LangChain Blog · Apr 3, 2026
Welcome Gemma 4: Frontier multimodal intelligence on device
Gemma 4 introduces enhanced multimodal capabilities, supporting image, text, and audio inputs, significantly improving model intelligence and deployment flexibility across devices.
Hugging Face Blog · Apr 2, 2026
March 2026: LangChain Newsletter
LangChain is pushing AI agents from experimental prototypes to manageable, collaborative, and securely deployable enterprise productivity tools through features like LangSmith Fleet, Skills, and Sandboxes.
LangChain Blog · Apr 2, 2026
Any Custom Frontend with Gradio's Backend
The introduction of Gradio.Server allows developers to use custom frontend frameworks while enjoying the robust backend support of Gradio, significantly enhancing application development flexibility and efficiency.
Hugging Face Blog · Apr 1, 2026
Announcing the LangChain + MongoDB Partnership: The AI Agent Stack That Runs On The Database You Already Trust
LangChain and MongoDB's deep integration transforms Atlas into a unified AI agent backend for vector search, persistent memory, data querying, and observability, aiming to solve data architecture fragmentation from prototype to production.
LangChain Blog · Apr 1, 2026
TRL v1.0: Post-Training Library Built to Move with the Field
The release of TRL v1.0 marks a significant shift in post-training libraries, designed to cope with the rapidly changing AI landscape while offering a stable yet experimental development environment.
Hugging Face Blog · Mar 31, 2026
Liberate your OpenClaw
With restrictions on Claude models in open agent platforms, Hugging Face offers two ways to help users quickly migrate and revive their OpenClaw agents, ensuring continued use of efficient open models.
Hugging Face Blog · Mar 27, 2026
How we build evals for Deep Agents
LangChain shares its core philosophy for building AI agent evaluation systems: more evals aren't better; instead, precisely define and measure the agent behaviors you care about to guide its evolution.
LangChain Blog · Mar 26, 2026
Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines
Modular Diffusers offers composable building blocks for easily creating customized diffusion pipelines, greatly enhancing flexibility and reusability.
Hugging Face Blog · Mar 5, 2026
Building News Agents for Daily News Recaps with MCP, Q, and tmux
The author shares how to build a multi-agent system using MCP and Q tools to automate daily news recap generation, showcasing the practical potential of new workflows.
Eugene Yan · May 4, 2025
LLM Powered Autonomous Agents
LLM powered autonomous agents combine planning, memory, and tool usage, showcasing their potential in handling complex tasks and indicating a significant shift in work methodologies.
Lilian Weng · Jun 23, 2023
Building a Financial Document Pipeline with LlamaParse
LlamaParse's 'agentic parsing' capability automatically transforms messy financial PDFs (like pay stubs and brokerage statements) into structured data and enables cross-document analysis, significantly boosting automation in workflows like loan underwriting.
LlamaIndex Blog ·
Building a Financial Due Diligence Agent with LiteParse
LlamaIndex demonstrates a financial due diligence AI agent built with just 600 lines of code and no vector database, leveraging LiteParse to extract PDF layout information for precise, highlighted source citations in answers.
LlamaIndex Blog ·
Introducing Claude Opus 4.7
Anthropic's Claude Opus 4.7 release focuses on enhanced reliability for complex, long-running tasks and self-verification capabilities, signaling a shift from AI as a tool to a trustworthy work partner.
Anthropic News ·
Introducing Claude Opus 4.8
Anthropic releases Claude Opus 4.8, with core breakthroughs in significantly improving the reliability, judgment, and long-running consistency of Agent tasks, marking AI's practical shift from 'usable' to 'trustworthy'.
Anthropic News ·
Introducing ParseBench: The First Document Parsing Benchmark for AI Agents
LlamaIndex releases ParseBench, the first document parsing benchmark designed for AI Agents, revealing that the traditional OCR standard of 'human-readable' is insufficient for agents' strict requirement of 'absolute correctness'.
LlamaIndex Blog ·
Is grep all you need? Lexical VS Sematic Search for Agents
The article explores the pros and cons of traditional text search tools like grep versus semantic search (RAG) in the AI Agent era, highlighting grep's limitations with unstructured documents and large-scale corpora, and proposes hybrid solutions.
LlamaIndex Blog ·
LlamaIndex Newsletter 2026-04-14
LlamaIndex releases ParseBench, the first OCR benchmark for AI agents, alongside tools tackling structural loss and security in document parsing, marking a paradigm shift from text extraction to contextual understanding.
LlamaIndex Blog ·
LlamaIndex Newsletter 2026-04-21
LlamaIndex launches ParseBench, the first document OCR benchmark for AI agents, alongside new parsing tools and benchmark results, marking a shift towards quantifiable document intelligence.
LlamaIndex Blog ·
LlamaIndex Newsletter 5-19-26
LlamaIndex introduces ParseBench, the first OCR benchmark designed specifically for AI agents, alongside open-sourcing a local document parsing server and a secure sandboxed CLI agent, signaling a shift in document processing towards agent-native infrastructure.
LlamaIndex Blog ·
Anthropic acquires Stainless
Anthropic acquires SDK toolmaker Stainless to strengthen AI Agent connectivity with external tools and data, signaling a shift in competition from models to Agent ecosystem building.
Anthropic News ·
OCR Accuracy Explained: What Impacts Performance and How to Improve It
OCR accuracy is not a single number but a multi-layered issue spanning characters, words, and semantic fields. Its real-world performance is impacted by image quality, document type, and hardware, and improving it requires building a complete processing pipeline.
LlamaIndex Blog ·
OCR for Tables: How to Extract Structured Data from Documents
The article delves into the challenges of extracting table data from documents, highlighting that it's not just about character recognition, but also involves layout analysis, structural reconstruction, and contextual reasoning, marking a key step towards intelligent document processing.
LlamaIndex Blog ·