← Back to Home

Tag: Developer Tools (91 articles)

Claude Opus 4.8: "a modest but tangible improvement"

Anthropic releases Claude Opus 4.8, focusing not on performance leaps but on significantly improving model 'honesty' — less hallucination, more willingness to admit uncertainty, which may be a more important direction than benchmark scores.

Simon Willison · May 29, 2026

Native RL APIs in vLLM

vLLM introduces native Reinforcement Learning APIs to standardize weight synchronization and improve asynchronous training support, addressing key pain points of framework fragmentation and fragile deployments in online RL for large models.

vLLM Blog · May 28, 2026

Quoting Armin Ronacher

Open-source maintainer Armin Ronacher highlights that AI-generated 'slop' issue reports are becoming a new burden for open-source communities, appearing professional but riddled with inaccuracies, wasting maintainers' time.

Simon Willison · May 25, 2026

Datasette Agent

Simon Willison combines his LLM library with Datasette to create a conversational AI assistant that lets users query and visualize databases using natural language.

Simon Willison · May 22, 2026

Google I/O, Gemini Spark, Antigravity

Google announced its personal AI Agent, Gemini Spark, and the underlying Antigravity tooling, but the shift to closed-source and vague security promises foreshadow a battle over AI agent control and trust.

Simon Willison · May 20, 2026

OlmoEarth v1.1: A more efficient family of models

Allen AI releases OlmoEarth v1.1, reducing compute costs by up to 3x by optimizing token sequence length in transformer models for satellite imagery, while maintaining performance, making large-scale environmental monitoring AI more economically viable.

Hugging Face Blog · May 20, 2026

Introducing the Ettin Reranker Family

Hugging Face has released six Ettin reranker models of varying sizes, designed to significantly improve the accuracy of search and RAG systems at low cost through a 'retrieve-then-rerank' two-stage architecture.

Hugging Face Blog · May 19, 2026

The Open Agent Leaderboard

Hugging Face and IBM launch the Open Agent Leaderboard, shifting evaluation from standalone models to full agent systems (including tools, planning, memory), while measuring both performance and cost.

Hugging Face Blog · May 18, 2026

Not so locked in any more

AI coding agents are driving down the cost of code rewrites and migrations to near zero, fundamentally undermining the 'lock-in' effect of technology stacks and making technology choices more flexible and reversible.

Simon Willison · May 15, 2026

Quoting Mitchell Hashimoto

Mitchell Hashimoto observes that modern programming languages have become highly fungible, as demonstrated by Bun's rapid migration from Zig to Rust, signaling a shift from language lock-in to on-demand tool replacement.

Simon Willison · May 15, 2026

llm 0.32a2

The LLM tool update supporting OpenAI's new /v1/responses endpoint reveals that AI model reasoning capabilities (especially between tool calls) are becoming core, and developers need to adapt to new interaction patterns.

Simon Willison · May 13, 2026

Quoting James Shore

James Shore warns that AI coding tools that only increase coding speed without reducing maintenance costs will lead to permanent technical debt inflation and "permanent indenture" for developers.

Simon Willison · May 12, 2026

Using LLM in the shebang line of a script

Simon Willison demonstrates integrating LLM tools into a script's shebang line, making natural language descriptions directly executable, signaling a major shift in programming interaction.

Simon Willison · May 12, 2026

Learning on the Shop floor

Shopify's CEO shares how their internal AI coding agent River, through a fully public collaboration model, transforms the entire company into a large-scale 'osmosis learning' workshop, revealing a novel paradigm for AI tool usage within organizations.

Simon Willison · May 11, 2026

Using Claude Code: The Unreasonable Effectiveness of HTML

A member of the Claude Code team argues that requesting output in HTML from AI is more effective than Markdown, leveraging its rich interactivity and visualization capabilities to significantly enhance clarity and user experience.

Simon Willison · May 9, 2026

Behind the Scenes Hardening Firefox with Claude Mythos Preview

Mozilla leveraged the Claude Mythos preview and advanced harnessing techniques to find and fix 423 Firefox security vulnerabilities in one month—a 20x increase over their average—marking a qualitative shift in AI security auditing from noise generation to high-value signal production.

Simon Willison · May 8, 2026

Live blog: Code w/ Claude 2026

Anthropic showcased a comprehensive shift from a single model to a platform-centric, multi-agent collaboration paradigm at Code w/ Claude, focusing on enabling developers to build and run complex, long-duration agent tasks more efficiently.

Simon Willison · May 6, 2026

Vibe coding and agentic engineering are getting closer than I'd like

Veteran developer Simon Willison finds that as AI coding agents become more reliable, his habit of reviewing every line of code is eroding, blurring the line between 'vibe coding' and professional 'agentic engineering' and raising deep concerns about responsibility for production code.

Simon Willison · May 6, 2026

TRE Python binding — ReDoS robustness demo

Simon Willison demonstrates how the TRE regex library is immune to ReDoS attacks that cripple Python's built-in re module, exposing the fatal flaw of traditional backtracking engines.

Simon Willison · May 5, 2026

Codex CLI 0.128.0 adds /goal

OpenAI's Codex CLI introduces a /goal command that enables the coding agent to automatically loop until a goal is met or token budget exhausted, signaling a shift from single-shot Q&A to persistent task execution.

Simon Willison · May 1, 2026

We need RSS for sharing abundant vibe-coded apps

As AI lowers the barrier to app development, leading to a surge in personal, fragmented 'vibe-coded' apps, we need a new paradigm for app distribution and management, akin to RSS for blogs.

Simon Willison · May 1, 2026

LLM 0.32a0 is a major backwards-compatible refactor

Simon Willison's LLM library undergoes a major refactor, evolving from simple text prompts/responses to a structure supporting multi-turn message sequences and streaming mixed-type responses, adapting to modern LLMs' multimodal and tool-calling capabilities.

Simon Willison · Apr 30, 2026

DeepInfra on Hugging Face Inference Providers 🔥

Hugging Face integrates the cost-effective inference platform DeepInfra into its Inference Providers ecosystem, offering developers more model choices, flexible billing, and a unified API.

Hugging Face Blog · Apr 29, 2026

Quoting Matthew Yglesias

Matthew Yglesias's quote highlights two paths for AI-assisted programming: personal 'vibecoding' versus professional software companies using AI to build better products, with the latter being the more sustainable value creation model.

Simon Willison · Apr 28, 2026

What's new in pip 26.1 - lockfiles and dependency cooldowns!

pip 26.1 introduces native lockfiles (pylock.toml) and a dependency cooldown feature, aiming to enhance supply chain security and reproducibility in the Python ecosystem by locking dependency versions and avoiding overly new packages.

Simon Willison · Apr 28, 2026

microsoft/VibeVoice

Microsoft releases VibeVoice, an MIT-licensed Whisper-style speech model with built-in speaker diarization, capable of locally transcribing up to one hour of audio on a Mac.

Simon Willison · Apr 28, 2026

How to build scalable web apps with OpenAI's Privacy Filter

OpenAI has open-sourced a high-performance PII detection model, and when combined with the Gradio Server framework, developers can quickly build web applications that handle sensitive information, marking a shift where privacy protection is becoming a standard part of AI application development.

Hugging Face Blog · Apr 27, 2026

GPT-5.5 prompting guide

OpenAI's official prompting guide for GPT-5.5 emphasizes it is not a drop-in replacement for GPT-5.2/5.4, requiring a fresh start in prompt engineering for optimal results.

Simon Willison · Apr 25, 2026

DeepSeek V4 in vLLM: Efficient Long-context Attention

vLLM announces support for DeepSeek V4 models, featuring a novel attention mechanism that tackles the core challenges of memory and computational cost in million-token long-context inference.

vLLM Blog · Apr 24, 2026

Extract PDF text in your browser with LiteParse for the web

Simon Willison adapted LlamaIndex's LiteParse into a pure browser-based version, enabling local PDF text extraction and OCR without a server, highlighting privacy and the importance of spatial text parsing.

Simon Willison · Apr 24, 2026

A pelican for GPT-5.5 via the semi-official Codex backdoor API

Although OpenAI's latest model GPT-5.5 hasn't officially launched its API, developers are already accessing it through a 'semi-official backdoor' in its Codex CLI using their ChatGPT subscription, revealing new dynamics in the battle over AI model distribution channels.

Simon Willison · Apr 24, 2026

How to Use Transformers.js in a Chrome Extension

Hugging Face shares a practical architecture for running AI models locally in Chrome extensions, revealing key design patterns for model deployment, messaging, and frontend-backend separation under Manifest V3.

Hugging Face Blog · Apr 23, 2026

Gemma 4 VLA Demo on Jetson Orin Nano Super

An end-to-end multimodal agent demo running on NVIDIA Jetson Orin Nano Super, showcasing how the model autonomously decides when to use the camera and answers questions with visual context, signaling the descent of powerful AI capabilities to edge devices.

Hugging Face Blog · Apr 22, 2026

Quoting Bobby Holley

Mozilla's CTO reports that using Anthropic's Claude AI, Firefox identified and fixed 271 vulnerabilities in an assessment, marking a shift where AI moves from an 'assistant' to a 'lead' role in security defense.

Simon Willison · Apr 22, 2026

Changes to GitHub Copilot Individual plans

GitHub Copilot tightens its individual plan due to the massive compute demands of AI agent workflows, halting sign-ups and restricting top models, signaling the unsustainability of per-request pricing in the agent era.

Simon Willison · Apr 22, 2026

Claude Token Counter, now with model comparisons

Simon Willison's tool reveals that Claude Opus 4.7's new tokenizer inflates token counts by ~46% for text and up to 3x for images compared to its predecessor, leading to higher real-world costs despite unchanged official pricing.

Simon Willison · Apr 20, 2026

Claude system prompts as a git timeline

Simon Willison transformed Anthropic's published Claude system prompt history into a Git-based tool, enabling developers to trace prompt evolution like code changes, revealing a new paradigm for AI behavior debugging and understanding.

Simon Willison · Apr 18, 2026

Adding a new content type to my blog-to-newsletter tool

Simon Willison demonstrates an efficient prompt that enabled an AI coding assistant to complete a complex feature extension in one shot, revealing the core Agentic engineering pattern of 'explaining requirements with code'.

Simon Willison · Apr 18, 2026

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

Simon Willison's famous 'pelican riding a bicycle' benchmark surprisingly shows a locally-run, smaller Alibaba Qwen3.6 model outperforming the cloud-based, massive Claude Opus 4.7 in creative SVG generation, revealing the surprising potential of open-source models for specific tasks.

Simon Willison · Apr 17, 2026

The PR you would have opened yourself

Hugging Face introduces a new tool to use AI to assist in porting models from the transformers library to MLX, revealing the core contradiction in open-source maintenance during the code agent era: the surge in contributions versus code quality and community communication costs.

Hugging Face Blog · Apr 16, 2026

Gemini 3.1 Flash TTS

Google's Gemini 3.1 Flash TTS is revolutionary because it uses detailed, screenplay-like prompts to precisely control emotion, accent, pace, and scene in speech synthesis, marking a shift from a 'tool' to a 'creative partner'.

Simon Willison · Apr 16, 2026

Trusted access for the next era of cyber defense

OpenAI launches GPT-5.4-Cyber, a model fine-tuned for defensive cybersecurity, and its "Trusted Access" program, signaling that leading AI companies are making cybersecurity a key battleground while seeking a new balance between safety and openness.

Simon Willison · Apr 15, 2026

Your harness, your memory

LangChain CEO argues that agent harnesses are inextricably tied to memory, and using a closed harness means ceding control of your memory to a third party, creating significant lock-in.

LangChain Blog · Apr 11, 2026

Deep Agents v0.5

LangChain introduces async subagents for its Deep Agents framework, enabling parallel task delegation and removing blocking bottlenecks in agent workflows.

LangChain Blog · Apr 8, 2026

Continual learning for AI agents

Continual learning for AI agents occurs at three layers: model, harness, and context, with context-layer evolution being the most practical and actionable.

LangChain Blog · Apr 6, 2026

How My Agents Self-Heal in Production

A LangChain engineer shares a complete pipeline for AI agents to automatically detect regressions, diagnose issues, and submit fix PRs after deployment, combining statistical methods with intelligent triage to reduce false positives.

LangChain Blog · Apr 4, 2026

Quoting Kyle Daigle

GitHub COO reveals 1B commits in 2025, GitHub Actions usage doubling annually, signaling exponential growth in developer activity.

Simon Willison · Apr 4, 2026

Open Models have crossed a threshold

LangChain's evaluations show that open models like GLM-5 and MiniMax M2.7 now match closed frontier models on core agent tasks such as file operations and tool use, at a fraction of the cost and with lower latency.

LangChain Blog · Apr 3, 2026

March 2026: LangChain Newsletter

LangChain is pushing AI agents from experimental prototypes to manageable, collaborative, and securely deployable enterprise productivity tools through features like LangSmith Fleet, Skills, and Sandboxes.

LangChain Blog · Apr 2, 2026

Any Custom Frontend with Gradio's Backend

The introduction of Gradio.Server allows developers to use custom frontend frameworks while enjoying the robust backend support of Gradio, significantly enhancing application development flexibility and efficiency.

Hugging Face Blog · Apr 1, 2026

Liberate your OpenClaw

With restrictions on Claude models in open agent platforms, Hugging Face offers two ways to help users quickly migrate and revive their OpenClaw agents, ensuring continued use of efficient open models.

Hugging Face Blog · Mar 27, 2026

How we build evals for Deep Agents

LangChain shares its core philosophy for building AI agent evaluation systems: more evals aren't better; instead, precisely define and measure the agent behaviors you care about to guide its evolution.

LangChain Blog · Mar 26, 2026

LLM Powered Autonomous Agents

LLM powered autonomous agents combine planning, memory, and tool usage, showcasing their potential in handling complex tasks and indicating a significant shift in work methodologies.

Lilian Weng · Jun 23, 2023

Building a Financial Document Pipeline with LlamaParse

LlamaParse's 'agentic parsing' capability automatically transforms messy financial PDFs (like pay stubs and brokerage statements) into structured data and enables cross-document analysis, significantly boosting automation in workflows like loan underwriting.

LlamaIndex Blog ·

Building a Financial Due Diligence Agent with LiteParse

LlamaIndex demonstrates a financial due diligence AI agent built with just 600 lines of code and no vector database, leveraging LiteParse to extract PDF layout information for precise, highlighted source citations in answers.

LlamaIndex Blog ·

Introducing Claude Opus 4.7

Anthropic's Claude Opus 4.7 release focuses on enhanced reliability for complex, long-running tasks and self-verification capabilities, signaling a shift from AI as a tool to a trustworthy work partner.

Anthropic News ·

Introducing Claude Opus 4.8

Anthropic releases Claude Opus 4.8, with core breakthroughs in significantly improving the reliability, judgment, and long-running consistency of Agent tasks, marking AI's practical shift from 'usable' to 'trustworthy'.

Anthropic News ·

Is grep all you need? Lexical VS Sematic Search for Agents

The article explores the pros and cons of traditional text search tools like grep versus semantic search (RAG) in the AI Agent era, highlighting grep's limitations with unstructured documents and large-scale corpora, and proposes hybrid solutions.

LlamaIndex Blog ·

LlamaIndex Newsletter 2026-04-14

LlamaIndex releases ParseBench, the first OCR benchmark for AI agents, alongside tools tackling structural loss and security in document parsing, marking a paradigm shift from text extraction to contextual understanding.

LlamaIndex Blog ·

LlamaIndex Newsletter 2026-04-21

LlamaIndex launches ParseBench, the first document OCR benchmark for AI agents, alongside new parsing tools and benchmark results, marking a shift towards quantifiable document intelligence.

LlamaIndex Blog ·

LlamaIndex Newsletter 5-19-26

LlamaIndex introduces ParseBench, the first OCR benchmark designed specifically for AI agents, alongside open-sourcing a local document parsing server and a secure sandboxed CLI agent, signaling a shift in document processing towards agent-native infrastructure.

LlamaIndex Blog ·

Anthropic acquires Stainless

Anthropic acquires SDK toolmaker Stainless to strengthen AI Agent connectivity with external tools and data, signaling a shift in competition from models to Agent ecosystem building.

Anthropic News ·

OCR Accuracy Explained: What Impacts Performance and How to Improve It

OCR accuracy is not a single number but a multi-layered issue spanning characters, words, and semantic fields. Its real-world performance is impacted by image quality, document type, and hardware, and improving it requires building a complete processing pipeline.

LlamaIndex Blog ·

OCR for Tables: How to Extract Structured Data from Documents

The article delves into the challenges of extracting table data from documents, highlighting that it's not just about character recognition, but also involves layout analysis, structural reconstruction, and contextual reasoning, marking a key step towards intelligent document processing.

LlamaIndex Blog ·
BitByAI — AI-powered, AI-evolved AI News