AIBearisharXiv – CS AI · Apr 147/10
🧠
Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models
Researchers introduce Grid2Matrix, a benchmark that reveals fundamental limitations in Vision-Language Models' ability to accurately process and describe visual details in grids. The study identifies a critical gap called 'Digital Agnosia'—where visual encoders preserve grid information that fails to translate into accurate language outputs—suggesting that VLM failures stem not from poor vision encoding but from the disconnection between visual features and linguistic expression.