AINeutralarXiv – CS AI · 3h ago6/10
🧠
Do Agents Know What They Can't Do? Evaluating Feasibility Awareness in Tool-Using Agents
Researchers propose FeasiGen, a framework for automatically generating infeasible task benchmarks to evaluate whether AI agents recognize when tasks cannot be completed with available tools. Testing across nine models reveals critical weaknesses, with agents continuing execution on impossible tasks up to 73.9% of the time, though multi-agent architectures show improved performance.