Tag: 智能体 (23 articles)

Data for Agents

NVIDIA experts argue that open data and synthetic data are key to building reliable AI agents: open data for explainability, synthetic data for scaling without exposing secrets.

Hugging Face Blog · Jul 9, 2026

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

Simon Willison used DSPy to automatically evaluate and improve Datasette Agent's SQL prompts, uncovering hidden flaws like column-name guessing and highlighting the shift from manual prompt tuning to scientific iteration.

Simon Willison · Jul 3, 2026

Quoting Jon Udell

Jon Udell argues that we should ditch the phrase “human in the loop” and instead adopt “agent-assisted process,” inviting AI agents into our own development loop rather than ceding authority to machines.

Simon Willison · Jun 29, 2026

What happened after 2,000 people tried to hack my AI assistant

A public AI security challenge saw 2,000 people attempt to leak secrets via prompt injection, with all 6,000 attempts failing, reflecting progress in frontier model defenses but also revealing lingering risks.

Simon Willison · Jun 27, 2026

Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor

Poolside's 33B-parameter agentic coding model, Laguna XS.2, achieves 2-3x inference speedup without quality loss through native vLLM integration, DFlash speculative decoding, and LLM Compressor quantization.

vLLM Blog · May 28, 2026

Live blog: Code w/ Claude 2026

Anthropic showcased a comprehensive shift from a single model to a platform-centric, multi-agent collaboration paradigm at Code w/ Claude, focusing on enabling developers to build and run complex, long-duration agent tasks more efficiently.

Simon Willison · May 6, 2026

Our AI started a cafe in Stockholm

An experiment where an AI autonomously runs a real-world cafe sparked ethical debate due to absurd procurement and causing trouble for external parties, revealing the deeper issue of AI agents lacking a sense of boundaries in the physical world.

Simon Willison · May 6, 2026

AI evals are becoming the new compute bottleneck

AI evaluation costs are skyrocketing, with single agent benchmark runs costing tens of thousands of dollars, and their inherent complexity makes them hard to compress, creating a new compute bottleneck for AI development.

Hugging Face Blog · Apr 30, 2026

Join us at PyCon US 2026 in Long Beach - we have new AI and security tracks this year

PyCon US 2026 features a dedicated AI track for the first time, covering topics from local model deployment to async agent patterns, signaling the Python community's systematic integration of AI into its core ecosystem and developer workflows.

Simon Willison · Apr 18, 2026

Meet HoloTab by HCompany. Your AI browser companion.

HCompany launches HoloTab, a free Chrome extension that simplifies complex web automation into natural language instructions via its 'show once, run anytime' Routines feature, marking the democratization of computer-use AI.

Hugging Face Blog · Apr 15, 2026

Deep Agents Deploy: an open alternative to Claude Managed Agents

LangChain launches Deep Agents Deploy, an open-source, model-agnostic agent framework and deployment solution aimed at breaking the lock-in of closed platforms by emphasizing memory ownership as the core of future agent competition.

LangChain Blog ·

How Agentic AI Improves Document Extraction Accuracy and Automation

The article argues that by introducing a 'plan-act-verify' agent loop, document processing is shifting from mechanical pattern matching to a cognitive task with spatial awareness and contextual reasoning, breaking through the limitations of traditional OCR.

LlamaIndex Blog ·

Introducing Claude Opus 4.7

Anthropic releases Claude Opus 4.7, focusing on enhanced complex coding and long-running task capabilities, with its 'self-verification' mechanism marking a key step towards more autonomous AI agents.

Anthropic News ·

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

NVIDIA releases its omni-modal understanding model Nemotron 3 Nano Omni, setting new open-source benchmarks across document, audio-video understanding, and agentic tasks, while delivering significantly higher efficiency than comparable models.

Hugging Face Blog ·

LlamaIndex Newsletter 7-8-26

LlamaIndex introduced Retrieval Harness and MCP restructure, enabling agents to actively traverse corpora with filesystem tools like list and grep, turning retrieval from guesswork into verification.

LlamaIndex Blog ·

Introducing Claude for Small Business

Anthropic launches Claude for Small Business, embedding AI into daily operational tools via pre-built connectors and workflows to address the shallow adoption of AI in small businesses.

Anthropic News ·

Microsoft Copilot Cowork Exfiltrates Files

A critical security flaw in Microsoft Copilot Cowork allowed attackers to exfiltrate user files via prompt injection by exploiting auto-sent emails and pre-authenticated download links.

Simon Willison ·

OCR for KYC: Why Standard Text Extraction Falls Short of Compliance Requirements

The article reveals the fundamental shortcomings of traditional OCR in financial KYC compliance, highlighting its failure with real-world documents and proposing 'Agentic OCR' as the solution.

LlamaIndex Blog ·

Introducing Claude Sonnet 5

Anthropic's Sonnet 5 delivers agentic performance close to the Opus flagship at significantly lower cost, enabling developers to build powerful autonomous agents with mid-tier models.

Anthropic News ·

Serving Agentic Workloads at Scale with vLLM x Mooncake

vLLM integrates Mooncake's distributed KV cache to solve the bottleneck of recomputing long context prefixes in agentic workloads, achieving a 3.8x throughput increase and a 46x reduction in time-to-first-token.

vLLM Blog ·

Vibe coding and agentic engineering are getting closer than I'd like

Veteran engineer Simon Willison observes that as AI coding tools become more reliable, the line he once drew between 'vibe coding' and 'agentic engineering' is blurring, raising new questions about code review responsibility and trust.

Simon Willison ·

Why Single-Pass Extraction Fails and What Deep Extraction Actually Solves

Single-pass extraction fails silently on complex documents, while deep extraction uses an iterative, agent-driven verification loop to achieve near-perfect accuracy, making it essential for production workflows.

LlamaIndex Blog ·

Your harness, your memory

The article argues that agent harnesses are inextricably tied to memory; using a closed or API-based harness means ceding control of your agent's memory to a third party, creating deep lock-in. Memory should be open.

LangChain Blog ·