AINeutralarXiv – CS AI · 6h ago6/10
🧠
WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis
Researchers introduce WorldCoder-Bench, a comprehensive benchmark for evaluating how well AI language models can generate interactive 3D web environments built with Three.js. The benchmark reveals that current frontier models achieve only 19.9-27.8% verification coverage, with failures primarily stemming from state management issues rather than missing visual elements.