y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#gui-agents News & Analysis

16 articles tagged with #gui-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

16 articles
AIBearisharXiv – CS AI · 4d ago7/10
🧠

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Researchers demonstrate MIRAGE, a technique that exploits vision-language model vulnerabilities in mobile GUI agents by injecting adversarial text into user-generated content regions. The attack achieves 23-30% success rates across five VLM agents without modifying apps or operating systems, revealing a critical security gap in AI-powered mobile automation that existing visual-quality defenses cannot reliably prevent.

AIBullisharXiv – CS AI · 5d ago7/10
🧠

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

MobileExplorer is a new framework that enables faster on-device inference for mobile GUI agents by leveraging parallel exploration of UI elements during model reasoning time. The system reduces latency by 23% while maintaining or improving task success rates, addressing privacy and network dependency concerns in mobile AI applications.

AIBullisharXiv – CS AI · 5d ago7/10
🧠

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

GUI-Libra presents a specialized training methodology for native GUI agents that addresses critical gaps between open-source and closed-source systems through action-aware supervised fine-tuning and improved reinforcement learning with partial verifiability. The work introduces an 81K curated GUI reasoning dataset and demonstrates consistent improvements across web and mobile benchmarks without requiring expensive online data collection.

AIBearisharXiv – CS AI · Apr 157/10
🧠

Mobile GUI Agents under Real-world Threats: Are We There Yet?

Researchers have identified critical vulnerabilities in mobile GUI agents powered by large language models, revealing that third-party content in real-world apps causes these agents to fail significantly more often than benchmark tests suggest. Testing on 122 dynamic tasks and over 3,000 static scenarios shows misleading rates of 36-42%, raising serious concerns about deploying these agents in commercial settings.

AIBullisharXiv – CS AI · Mar 127/10
🧠

Hybrid Self-evolving Structured Memory for GUI Agents

Researchers developed HyMEM, a brain-inspired hybrid memory system that significantly improves GUI agents' ability to interact with computers. The system uses graph-based structured memory combining symbolic nodes with trajectory embeddings, enabling smaller 7B/8B models to match or exceed performance of larger closed-source models like GPT-4o.

🧠 GPT-4
AIBullisharXiv – CS AI · Mar 67/10
🧠

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

WebFactory introduces a fully automated reinforcement learning pipeline that efficiently transforms large language models into GUI agents without requiring unsafe live web interactions or costly human-annotated data. The system demonstrates exceptional data efficiency by achieving comparable performance to human-trained agents while using synthetic data from only 10 websites.

AINeutralarXiv – CS AI · Mar 56/10
🧠

Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

Researchers introduced WebRRSBench, a comprehensive benchmark evaluating multimodal large language models' reasoning, robustness, and safety capabilities for web understanding tasks. Testing 11 MLLMs on 3,799 QA pairs from 729 websites revealed significant gaps in compositional reasoning, UI robustness, and safety-critical action recognition.

AIBearisharXiv – CS AI · Mar 47/104
🧠

Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

Researchers discovered a critical security vulnerability in AI-powered GUI agents on Android, where malicious apps can hijack agent actions without requiring dangerous permissions. The 'Action Rebinding' attack exploits timing gaps between AI observation and action, achieving 100% success rates in tests across six popular Android GUI agents.

AIBullisharXiv – CS AI · Feb 277/107
🧠

Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

Researchers introduce GUIPruner, a training-free framework that addresses efficiency bottlenecks in high-resolution GUI agents by eliminating spatiotemporal redundancy. The system achieves 3.4x reduction in computational operations and 3.3x speedup while maintaining 94% of original performance, enabling real-time navigation with minimal resource consumption.

AIBullisharXiv – CS AI · 4d ago6/10
🧠

GUI Agents for Continual Game Generation

Researchers introduce PlaytestArena and Play2Code, systems that use GUI agents to evaluate and iteratively improve game generation by having AI agents play games rather than relying on one-shot code generation. Play2Code achieves 66.8% success on game rubrics through a dialogue loop between coding and playing agents, significantly outperforming baseline approaches.

AIBullisharXiv – CS AI · May 116/10
🧠

LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning

Researchers introduce LiteGUI, a novel training framework that enhances lightweight GUI agents (2B-3B parameters) through reinforcement learning and knowledge distillation, achieving competitive performance with much larger models. The approach addresses key limitations of traditional supervised fine-tuning by incorporating multi-solution learning and dynamic retrieval mechanisms to reduce hallucinations in automated interface interaction tasks.

AINeutralarXiv – CS AI · May 16/10
🧠

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

Researchers present a comprehensive framework for combining Reinforcement Learning with GUI agents to create more autonomous digital systems. The work identifies three key RL approaches (Offline, Online, and Hybrid), reveals emerging technical trends like world-model-based training and multi-tier reward architectures, and proposes a roadmap toward safer, more reliable automation systems.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Researchers introduce the 'Turing Test on Screen,' a framework for measuring how well autonomous GUI agents can mimic human behavior to evade detection systems. The study reveals that current LLM-based agents exhibit unnatural interaction patterns and proposes humanization methods to improve their ability to operate undetected in adversarial digital environments.

AIBullisharXiv – CS AI · Apr 146/10
🧠

Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

Researchers propose Trajectory Induced Preference Optimization (TIPO), a novel method for training mobile GUI agents to respect user privacy preferences while maintaining task execution capability. The approach addresses the challenge that privacy-conscious users generate structurally different execution patterns than utility-focused users, requiring specialized optimization techniques to properly align agent behavior with individual privacy preferences.

AIBullisharXiv – CS AI · Mar 36/1010
🧠

Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

Researchers developed ST-Lite, a training-free KV cache compression framework that accelerates GUI agents by 2.45x while using only 10-20% of the cache budget. The solution addresses memory and latency constraints in Vision-Language Models for autonomous GUI interactions through specialized attention pattern optimization.

AINeutralHugging Face Blog · Jun 64/105
🧠

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

ScreenSuite is introduced as a comprehensive evaluation suite specifically designed for GUI (Graphical User Interface) agents. The tool appears to provide testing and assessment capabilities for AI systems that interact with graphical interfaces.