Tag: 网络安全 (11 articles)

Patterns for Building Cybersecurity Evals

This article breaks down the four core components of cybersecurity evaluations and introduces multi-level tasks for more granular measurement of AI's offensive and defensive capabilities.

Eugene Yan · Jun 21, 2026

CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

A specialized 4B cybersecurity model matches or outperforms an 8B generalist on key tasks, revealing the trend towards 'small, specialized, and local' AI deployment in security.

Hugging Face Blog · May 9, 2026

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

The UK's AI Security Institute found GPT-5.5's cyber capabilities for finding vulnerabilities are comparable to the leading Claude Mythos model, but its general availability marks a new phase in AI-driven cybersecurity offense and defense.

Simon Willison · May 1, 2026

Quoting Bobby Holley

Mozilla's CTO reports that using Anthropic's Claude AI, Firefox identified and fixed 271 vulnerabilities in an assessment, marking a shift where AI moves from an 'assistant' to a 'lead' role in security defense.

Simon Willison · Apr 22, 2026

AI and the Future of Cybersecurity: Why Openness Matters

Hugging Face argues that the rise of AI-driven autonomous cybersecurity systems (like Mythos) reveals the critical structural advantage of open source in enabling distributed defense and mitigating risks from closed-source software.

Hugging Face Blog · Apr 21, 2026

Trusted access for the next era of cyber defense

OpenAI launches GPT-5.4-Cyber, a model fine-tuned for defensive cybersecurity, and its "Trusted Access" program, signaling that leading AI companies are making cybersecurity a key battleground while seeking a new balance between safety and openness.

Simon Willison · Apr 15, 2026

Cybersecurity Looks Like Proof of Work Now

AI security reviews reveal that system security is evolving into an economic game: defenders must spend more computational resources (tokens) than attackers to ensure safety, which unexpectedly boosts the value of open-source projects.

Simon Willison · Apr 15, 2026

Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked

A real-world attack where hackers bypassed Instagram's account recovery by simply asking Meta's AI chatbot to link a new email, revealing the severe risks of wiring AI directly into critical systems without proper authorization boundaries.

Simon Willison ·

More details on Fable 5’s cyber safeguards and our jailbreak framework

Anthropic reveals its four-tier safety classifier for Fable 5 and a draft jailbreak severity framework, aiming to set a common language for AI risk communication across the industry and with governments.

Anthropic News ·

Government of Alberta uses Claude to find and fix cybersecurity vulnerabilities across government systems

The Government of Alberta used 50 Claude Code agents to scan 466 million lines of code in 20 hours, finding and fixing security vulnerabilities and compressing years of audit work into a single day.

Anthropic News ·

May 22, 2026AnnouncementsProject Glasswing: An initial update

Anthropic's Project Glasswing, using Claude Mythos Preview, discovered over ten thousand high-severity vulnerabilities in critical global software within a month, shifting the core cybersecurity bottleneck from finding flaws to fixing them.

Anthropic News ·