AINeutralarXiv – CS AI · 14h ago6/10
🧠
GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing
Researchers introduce GUITestScape, a new benchmark for evaluating AI agents' ability to autonomously test Android applications, along with GUIJudge, an evaluator that assesses both interaction and display defects beyond predefined annotations. The work addresses critical gaps in current GUI testing evaluation by enabling process-aware assessment of agent capabilities rather than just final outcomes.