AIBearisharXiv – CS AI · 18h ago7/10
🧠
VisualLeakBench: Reproducible Action-Boundary Propagation Failures in Vision-Language Agents
Researchers introduce VisualLeakBench, a 500-image benchmark that reveals critical security vulnerabilities in vision-language agents, where sensitive information visible in screenshots and documents is propagated into tool arguments. Testing four production VLM systems shows baseline failure rates of 78.8% for personally identifiable information and 85.5% for unsafe text, with defensive prompts reducing PII propagation but leaving unsafe-text leakage at 52.6%.