AINeutralarXiv – CS AI · 9h ago7/10
🧠
Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents
Researchers introduce PhoneSafety, a benchmark of 700 safety-critical moments across mobile apps, revealing that stronger AI phone-use agents don't necessarily make safer decisions at risky moments. The study distinguishes between genuine safety judgment and mere inability to act, challenging how AI safety in mobile agents is currently evaluated.