AINeutralarXiv – CS AI · 7h ago7/10
🧠
SoK: Robustness in Large Language Models against Jailbreak Attacks
Researchers introduce Security Cube, a comprehensive evaluation framework for assessing Large Language Model robustness against jailbreak attacks. The study systematically catalogs existing attack and defense methods while establishing benchmarks across 13 attack vectors and 5 defense mechanisms, revealing critical gaps in current LLM safety practices.