AINeutralarXiv – CS AI · 6h ago7/10
🧠
Position: Reasoning After Perception Means Reasoning Without Vision
Researchers challenge the assumption that language reasoning can compensate for vision-language model weaknesses, arguing that deferring visual reasoning to text collapses spatial information and degrades perception to passive encoding. The study introduces the Turing Eye Test to demonstrate tasks requiring visual reasoning in pixel space cannot be solved through text-only reasoning alone, suggesting AI architectures must shift toward reasoning within perception rather than about it.