AINeutralarXiv – CS AI · 7h ago6/10
🧠
Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents
Researchers introduce Teach VLM, a vision-language model that extracts operational knowledge from mobile screen demonstrations to create interpretable instructions for GUI automation agents. The system uses a novel Teach-and-Repeat paradigm where extracted task procedures guide downstream execution agents, achieving state-of-the-art performance in operation semantics prediction and improving task success rates in Android environments.