越狱防御 — Tag

Prompt Injection as Role Confusion

Research reveals LLMs rely on text style rather than tags to distinguish instructions; destyling drops injection success rates from 61% to 10%.

Simon Willison · Jun 23, 2026