AIBearisharXiv β CS AI Β· 14h ago7/10
π§
Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models
Researchers introduce Grid2Matrix, a benchmark that reveals fundamental limitations in Vision-Language Models' ability to accurately process and describe visual details in grids. The study identifies a critical gap called 'Digital Agnosia'βwhere visual encoders preserve grid information that fails to translate into accurate language outputsβsuggesting that VLM failures stem not from poor vision encoding but from the disconnection between visual features and linguistic expression.