AINeutralarXiv – CS AI · 14h ago6/10
🧠
AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials
Researchers introduced AtomWorld, a benchmark for evaluating how well large language models can perform spatial reasoning tasks in materials science, specifically atomic structure manipulation. The study reveals that current LLMs like Claude Opus 4.6 struggle with complex spatial operations, achieving success rates below 12% for rotation tasks, suggesting they function better as collaborative tools than autonomous scientific agents.
🧠 Claude🧠 Opus