AI对齐 — 标签

Google DeepMind 的 AI 安全新思路：把自家 AI Agent 当‘潜在内鬼’来防

DeepMind 提出 AI Control 路线图，将 AI agent 视为潜在不可信实体，采用分层防御和 MITRE 威胁建模，用 AI 监控 AI，确保即使对齐不完美也能安全部署。

Google DeepMind Blog ·