AIBearisharXiv – CS AI · 7h ago7/10
🧠
Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
Researchers have identified a new jailbreak attack called Persona Attack that exploits LLMs' memory and conversation context to bypass safety mechanisms. By incrementally injecting instructions through dialogue, the attack achieves up to 95% success rates, demonstrating that accumulated memory instructions can override built-in safety alignment regardless of traditional safety training.