AIBearisharXiv – CS AI · 9h ago7/10
🧠
Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming
Researchers challenge the credibility of recent computer-using agent (CUA) red-teaming studies by reproducing published prompt-injection attacks against frontier models Claude Sonnet 4.6 and GPT-5.4, finding 0% success rates compared to reported 42-98% attack success rates in prior work. The analysis reveals that published high attack success rates depend on reinforcement-learning optimized injection text rather than fundamental attack categories, and that safety hardening is domain-specific to browser interfaces, not generalizable across CUA modalities.
🧠 GPT-5🧠 Claude🧠 Sonnet