AINeutralarXiv – CS AI · Apr 106/10
🧠
DISSECT: Diagnosing Where Vision Ends and Language Priors Begin in Scientific VLMs
Researchers introduce DISSECT, a 12,000-question diagnostic benchmark that reveals a critical "perception-integration gap" in Vision-Language Models—where VLMs successfully extract visual information but fail to reason about it during downstream tasks. Testing 18 VLMs across Chemistry and Biology shows open-source models systematically struggle with integrating visual input into reasoning, while closed-source models demonstrate superior integration capabilities.