AINeutralarXiv โ CS AI ยท 8h ago7/10
๐ง
WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
Researchers introduced WebTestBench, a new benchmark for evaluating automated web testing using AI agents and large language models. The study reveals significant gaps between current AI capabilities and industrial deployment needs, with LLMs struggling with test completeness, defect detection, and long-term interaction reliability.