AIBullisharXiv โ CS AI ยท 10h ago7/10
๐ง
Directional Embedding Smoothing for Robust Vision Language Models
Researchers have extended the RESTA defense mechanism to vision-language models (VLMs) to protect against jailbreaking attacks that can cause AI systems to produce harmful outputs. The study found that directional embedding noise significantly reduces attack success rates across the JailBreakV-28K benchmark, providing a lightweight security layer for AI agent systems.