AIBullisharXiv – CS AI · 15h ago6/10
🧠
Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation
Researchers introduce HyperTrack, a large-scale dataset of 16,000+ mobile GUI navigation tasks across 650+ Chinese applications, and GUIEvalKit, an open-source benchmarking toolkit for evaluating Vision-Language Models. The study demonstrates that reinforcement-based finetuning substantially outperforms supervised learning for mobile automation tasks, with implications for developing more capable AI agents.