AINeutralarXiv – CS AI · 10h ago7/10
🧠
EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
Researchers introduce EnactToM, a benchmark testing whether AI agents can understand and act on others' beliefs in multi-agent embodied environments. Current frontier models achieve 0% on functional theory of mind tasks, revealing a critical gap in AI reasoning capabilities despite performing better on direct belief questions.