AINeutralarXiv – CS AI · 6h ago6/10
🧠
HLL: Can Agents Cross Humanity's Last Line of Verification?
Researchers introduced HLL (Humanity's Last Line of Verification), a benchmark testing whether multimodal AI agents can bypass CAPTCHA protections designed to verify human users. Testing eight frontier models revealed significant brittleness: agent performance varies sharply across CAPTCHA types, degrades under realistic conditions, and fails when solutions must be supported by valid action traces, exposing gaps in localization, action calibration, and process consistency.