AINeutralarXiv – CS AI · 14h ago6/10
🧠
GroundAct: Can LLM Agents Ground Actions in Environmental States?
Researchers introduce GroundAct, a benchmark revealing that LLM agents fail dramatically when task feasibility depends on environmental context rather than explicit instructions, dropping from 85-96% to 29-53% success rates. The study identifies action grounding—inferring feasibility from environmental state—as a fundamental capability gap that scaling alone cannot solve.