AINeutralarXiv – CS AI · 7h ago7/10
🧠
What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents
Researchers identify 'compliance bias' in autonomous agents trained via human feedback, where systems proceed with unsafe actions despite lacking necessary information, authorization, or evidence. The study proposes abstention-aware benchmarks and evaluation protocols that can block up to 89% of hazardous actions while maintaining 87.5% usability, challenging the assumption that safety and performance are inherently trade-offs.