AIBearisharXiv – CS AI · 10h ago7/10
🧠
The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space
Researchers reveal that multimodal large language models achieve high visual reasoning benchmark scores by exploiting a 'Cartesian Shortcut'—leveraging grid-based layouts that convert to explicit text coordinates rather than performing genuine visual understanding. The Polaris-Bench study shows frontier models collapse from 70-83% accuracy to 31-39% when benchmarks are reformulated in polar coordinate space, exposing critical deficiencies in topology-invariant reasoning.