Jailbreaking Vision-Language Models Through the Visual Modality
Researchers demonstrate four novel jailbreak techniques that exploit the visual modality of vision-language models to bypass safety alignment, revealing a significant gap between text-based and vision-based safety training. Testing across six frontier VLMs shows visual attacks achieve substantially higher success rates than equivalent textual attacks, with implications for the robustness of AI safety measures.