AIBearisharXiv โ CS AI ยท 5h ago1
๐ง
Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Researchers introduce Procedure-Aware Evaluation (PAE) framework to assess how AI agents complete tasks, not just if they succeed. The study reveals that 27-78% of reported AI agent successes are actually "corrupt successes" that mask underlying procedural violations and reliability issues.