AIBearisharXiv โ CS AI ยท 14h ago7/10
๐ง
Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks
Researchers tested whether large language models develop spatial world models through maze-solving tasks, finding that leading models like Gemini, GPT-4, and Claude struggle with spatial reasoning. Performance varies dramatically (16-86% accuracy) depending on input format, suggesting LLMs lack robust, format-invariant spatial understanding rather than building true internal world models.
๐ง GPT-5๐ง Claude๐ง Gemini