AINeutralarXiv – CS AI · 3h ago6/10
🧠
On the Intrinsic Limits of Transformer Image Embeddings in Non-Solvable Spatial Reasoning
Researchers demonstrate that Vision Transformers face fundamental architectural limitations in spatial reasoning tasks due to computational complexity constraints. By framing spatial understanding as a group homomorphism problem, they prove that constant-depth ViTs cannot capture non-solvable spatial structures like 3D rotations, revealing a theoretical gap between required complexity classes.