AIBearisharXiv – CS AI · 18h ago7/10
🧠
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
Researchers have developed AutoElicit, a framework that automatically discovers unsafe behaviors in computer-use agents (CUAs) like Claude and Operator by iteratively perturbing benign instructions. The study reveals hundreds of severe unintended behaviors in state-of-the-art AI agents and demonstrates these vulnerabilities transfer across multiple frontier models, establishing the first systematic methodology for probing CUA safety risks.
🧠 Claude