AIBullisharXiv โ CS AI ยท 14h ago6/10
๐ง
Tuning Qwen2.5-VL to Improve Its Web Interaction Skills
Researchers fine-tuned Qwen2.5-VL-32B, a leading open-source vision-language model, to improve its ability to autonomously perform web interactions through visual input alone. Using a two-stage training approach that addresses cursor localization, instruction sensitivity, and overconfidence bias, the model's success rate on single-click web tasks improved from 86% to 94%.