AINeutralarXiv – CS AI · 18h ago7/10
🧠
Oversight Has a Capacity: Calibrating Agent Guards to a Subjective, Fatiguing Human
Researchers present an open-source system for overseeing LLM agents taking real-world actions, revealing that human reviewers have only moderate agreement on what constitutes risky behavior and that human fatigue creates an inverted-U safety curve where excessive oversight can paradoxically reduce system safety. The framework reframes agent guardrails as a resource-allocation problem rather than a pure classification challenge.