AIBearisharXiv โ CS AI ยท 1d ago7/10
๐ง
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
Researchers demonstrate that Claude Code AI agent can autonomously discover novel adversarial attack algorithms against large language models, achieving significantly higher success rates than existing methods. The discovered attacks achieve up to 40% success rate on CBRN queries and 100% attack success rate against Meta-SecAlign-70B, compared to much lower rates from traditional methods.
๐ง Claude