AIBearisharXiv – CS AI · 6h ago7/10
🧠
Beyond Visual Forensics: Auditing Multimodal Robustness for Synthetic Medical Image Detection
Researchers have identified a critical multimodal vulnerability in vision-language models (VLMs) used for detecting synthetic medical images: when given both image and text data, these models can overweight textual context, causing identical images to receive different authenticity predictions based solely on accompanying metadata changes. The study introduces a benchmark to systematically audit this robustness gap, revealing risks for clinical deployment.