AIBearisharXiv – CS AI · Apr 147/10
🧠
Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks
Researchers tested whether large language models develop spatial world models through maze-solving tasks, finding that leading models like Gemini, GPT-4, and Claude struggle with spatial reasoning. Performance varies dramatically (16-86% accuracy) depending on input format, suggesting LLMs lack robust, format-invariant spatial understanding rather than building true internal world models.
🧠 GPT-5🧠 Claude🧠 Gemini