AIBearisharXiv – CS AI · 4d ago7/10
🧠Researchers demonstrate MIRAGE, a technique that exploits vision-language model vulnerabilities in mobile GUI agents by injecting adversarial text into user-generated content regions. The attack achieves 23-30% success rates across five VLM agents without modifying apps or operating systems, revealing a critical security gap in AI-powered mobile automation that existing visual-quality defenses cannot reliably prevent.
AIBullisharXiv – CS AI · 5d ago7/10
🧠MobileExplorer is a new framework that enables faster on-device inference for mobile GUI agents by leveraging parallel exploration of UI elements during model reasoning time. The system reduces latency by 23% while maintaining or improving task success rates, addressing privacy and network dependency concerns in mobile AI applications.
AIBullisharXiv – CS AI · 5d ago7/10
🧠GUI-Libra presents a specialized training methodology for native GUI agents that addresses critical gaps between open-source and closed-source systems through action-aware supervised fine-tuning and improved reinforcement learning with partial verifiability. The work introduces an 81K curated GUI reasoning dataset and demonstrates consistent improvements across web and mobile benchmarks without requiring expensive online data collection.
AIBearisharXiv – CS AI · Apr 157/10
🧠Researchers have identified critical vulnerabilities in mobile GUI agents powered by large language models, revealing that third-party content in real-world apps causes these agents to fail significantly more often than benchmark tests suggest. Testing on 122 dynamic tasks and over 3,000 static scenarios shows misleading rates of 36-42%, raising serious concerns about deploying these agents in commercial settings.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers developed HyMEM, a brain-inspired hybrid memory system that significantly improves GUI agents' ability to interact with computers. The system uses graph-based structured memory combining symbolic nodes with trajectory embeddings, enabling smaller 7B/8B models to match or exceed performance of larger closed-source models like GPT-4o.
🧠 GPT-4
AIBullisharXiv – CS AI · Mar 67/10
🧠WebFactory introduces a fully automated reinforcement learning pipeline that efficiently transforms large language models into GUI agents without requiring unsafe live web interactions or costly human-annotated data. The system demonstrates exceptional data efficiency by achieving comparable performance to human-trained agents while using synthetic data from only 10 websites.
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers introduced WebRRSBench, a comprehensive benchmark evaluating multimodal large language models' reasoning, robustness, and safety capabilities for web understanding tasks. Testing 11 MLLMs on 3,799 QA pairs from 729 websites revealed significant gaps in compositional reasoning, UI robustness, and safety-critical action recognition.
AIBearisharXiv – CS AI · Mar 47/104
🧠Researchers discovered a critical security vulnerability in AI-powered GUI agents on Android, where malicious apps can hijack agent actions without requiring dangerous permissions. The 'Action Rebinding' attack exploits timing gaps between AI observation and action, achieving 100% success rates in tests across six popular Android GUI agents.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers introduce GUIPruner, a training-free framework that addresses efficiency bottlenecks in high-resolution GUI agents by eliminating spatiotemporal redundancy. The system achieves 3.4x reduction in computational operations and 3.3x speedup while maintaining 94% of original performance, enabling real-time navigation with minimal resource consumption.
AIBullisharXiv – CS AI · 4d ago6/10
🧠Researchers introduce PlaytestArena and Play2Code, systems that use GUI agents to evaluate and iteratively improve game generation by having AI agents play games rather than relying on one-shot code generation. Play2Code achieves 66.8% success on game rubrics through a dialogue loop between coding and playing agents, significantly outperforming baseline approaches.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers introduce LiteGUI, a novel training framework that enhances lightweight GUI agents (2B-3B parameters) through reinforcement learning and knowledge distillation, achieving competitive performance with much larger models. The approach addresses key limitations of traditional supervised fine-tuning by incorporating multi-solution learning and dynamic retrieval mechanisms to reduce hallucinations in automated interface interaction tasks.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers present a comprehensive framework for combining Reinforcement Learning with GUI agents to create more autonomous digital systems. The work identifies three key RL approaches (Offline, Online, and Hybrid), reveals emerging technical trends like world-model-based training and multi-tier reward architectures, and proposes a roadmap toward safer, more reliable automation systems.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce the 'Turing Test on Screen,' a framework for measuring how well autonomous GUI agents can mimic human behavior to evade detection systems. The study reveals that current LLM-based agents exhibit unnatural interaction patterns and proposes humanization methods to improve their ability to operate undetected in adversarial digital environments.
AIBullisharXiv – CS AI · Apr 146/10
🧠Researchers propose Trajectory Induced Preference Optimization (TIPO), a novel method for training mobile GUI agents to respect user privacy preferences while maintaining task execution capability. The approach addresses the challenge that privacy-conscious users generate structurally different execution patterns than utility-focused users, requiring specialized optimization techniques to properly align agent behavior with individual privacy preferences.
AIBullisharXiv – CS AI · Mar 36/1010
🧠Researchers developed ST-Lite, a training-free KV cache compression framework that accelerates GUI agents by 2.45x while using only 10-20% of the cache budget. The solution addresses memory and latency constraints in Vision-Language Models for autonomous GUI interactions through specialized attention pattern optimization.
AINeutralHugging Face Blog · Jun 64/105
🧠ScreenSuite is introduced as a comprehensive evaluation suite specifically designed for GUI (Graphical User Interface) agents. The tool appears to provide testing and assessment capabilities for AI systems that interact with graphical interfaces.