🧠 AI⚪ NeutralImportance 6/10

SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

arXiv – CS AI|Yuyou Zhang, Radu Corcodel, Chiori Hori, Anoop Cherian, Ding Zhao|March 3, 2026 at 05:00 AM|4 views

🤖AI Summary

Researchers introduced SpinBench, a new benchmark for evaluating spatial reasoning abilities in vision language models (VLMs), focusing on perspective taking and viewpoint transformations. Testing 43 state-of-the-art VLMs revealed systematic weaknesses including strong egocentric bias and poor rotational understanding, with human performance significantly outpacing AI models at 91.2% accuracy.

Key Takeaways

→SpinBench introduces a cognitively grounded diagnostic benchmark specifically designed to test spatial reasoning in vision language models.
→Testing of 43 state-of-the-art VLMs revealed systematic weaknesses in perspective taking, rotational understanding, and handling symmetrical transformations.
→Human subjects achieved 91.2% accuracy on the benchmark, significantly outperforming current AI models.
→The benchmark shows strong correlation between human response time and VLM accuracy, indicating shared spatial reasoning challenges.
→Results highlight critical gaps in VLMs' ability to reason about physical space and viewpoint transformations.