y0news
AnalyticsDigestsSourcesRSSAICrypto
#vulnerability-research2 articles
2 articles
AIBearisharXiv โ€“ CS AI ยท 17h ago7/10
๐Ÿง 

Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads

Researchers have developed SAHA (Safety Attention Head Attack), a new jailbreak framework that exploits vulnerabilities in deeper attention layers of open-source large language models. The method improves attack success rates by 14% over existing techniques by targeting insufficiently aligned attention heads rather than surface-level prompts.

AIBearisharXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

Researchers evaluated prompt injection and jailbreak vulnerabilities across multiple open-source LLMs including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma. The study found significant behavioral variations across models and that lightweight defense mechanisms can be consistently bypassed by long, reasoning-heavy prompts.