AIBearisharXiv โ CS AI ยท 5h ago6/10
๐ง
Don't Blink: Evidence Collapse during Multimodal Reasoning
Research reveals that Vision Language Models (VLMs) progressively lose visual grounding during reasoning tasks, creating dangerous low-entropy predictions that appear confident but lack visual evidence. The study found attention to visual evidence drops by over 50% during reasoning across multiple benchmarks, requiring task-aware monitoring for safe AI deployment.