AIBearisharXiv โ CS AI ยท 17h ago7/10
๐ง
Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads
Researchers have developed SAHA (Safety Attention Head Attack), a new jailbreak framework that exploits vulnerabilities in deeper attention layers of open-source large language models. The method improves attack success rates by 14% over existing techniques by targeting insufficiently aligned attention heads rather than surface-level prompts.