Why Do Aligned LLMs Remain Jailbreakable: Refusal-Escape Directions, Operator-Level Sources, and Safety-Utility Trade-off
Researchers identify Refusal-Escape Directions (RED) as mathematical perturbation vectors that explain why aligned LLMs remain vulnerable to jailbreaks. The study reveals structural vulnerabilities arise from fundamental trade-offs between safety mechanisms and model utility, with normalization and residual connections as key exploitable components.