AIBullisharXiv โ CS AI ยท 14h ago7/10
๐ง
Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility
Researchers propose Risk Awareness Injection (RAI), a lightweight, training-free framework that enhances vision-language models' ability to recognize unsafe content by amplifying risk signals in their feature space. The method maintains model utility while significantly reducing vulnerability to multimodal jailbreak attacks, addressing a critical security gap in VLMs.