AIBearisharXiv – CS AI · Mar 267/10
🧠
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
Researchers developed a genetic algorithm-based method using persona prompts to exploit large language models, reducing refusal rates by 50-70% across multiple LLMs. The study reveals significant vulnerabilities in AI safety mechanisms and demonstrates how these attacks can be enhanced when combined with existing methods.