y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#gui-agents News & Analysis

10 articles tagged with #gui-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles
AIBearisharXiv โ€“ CS AI ยท 2d ago7/10
๐Ÿง 

Mobile GUI Agents under Real-world Threats: Are We There Yet?

Researchers have identified critical vulnerabilities in mobile GUI agents powered by large language models, revealing that third-party content in real-world apps causes these agents to fail significantly more often than benchmark tests suggest. Testing on 122 dynamic tasks and over 3,000 static scenarios shows misleading rates of 36-42%, raising serious concerns about deploying these agents in commercial settings.

AIBullisharXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

Hybrid Self-evolving Structured Memory for GUI Agents

Researchers developed HyMEM, a brain-inspired hybrid memory system that significantly improves GUI agents' ability to interact with computers. The system uses graph-based structured memory combining symbolic nodes with trajectory embeddings, enabling smaller 7B/8B models to match or exceed performance of larger closed-source models like GPT-4o.

๐Ÿง  GPT-4
AIBullisharXiv โ€“ CS AI ยท Mar 67/10
๐Ÿง 

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

WebFactory introduces a fully automated reinforcement learning pipeline that efficiently transforms large language models into GUI agents without requiring unsafe live web interactions or costly human-annotated data. The system demonstrates exceptional data efficiency by achieving comparable performance to human-trained agents while using synthetic data from only 10 websites.

AINeutralarXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

Researchers introduced WebRRSBench, a comprehensive benchmark evaluating multimodal large language models' reasoning, robustness, and safety capabilities for web understanding tasks. Testing 11 MLLMs on 3,799 QA pairs from 729 websites revealed significant gaps in compositional reasoning, UI robustness, and safety-critical action recognition.

AIBearisharXiv โ€“ CS AI ยท Mar 47/104
๐Ÿง 

Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

Researchers discovered a critical security vulnerability in AI-powered GUI agents on Android, where malicious apps can hijack agent actions without requiring dangerous permissions. The 'Action Rebinding' attack exploits timing gaps between AI observation and action, achieving 100% success rates in tests across six popular Android GUI agents.

AIBullisharXiv โ€“ CS AI ยท Feb 277/107
๐Ÿง 

Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

Researchers introduce GUIPruner, a training-free framework that addresses efficiency bottlenecks in high-resolution GUI agents by eliminating spatiotemporal redundancy. The system achieves 3.4x reduction in computational operations and 3.3x speedup while maintaining 94% of original performance, enabling real-time navigation with minimal resource consumption.

AINeutralarXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Researchers introduce the 'Turing Test on Screen,' a framework for measuring how well autonomous GUI agents can mimic human behavior to evade detection systems. The study reveals that current LLM-based agents exhibit unnatural interaction patterns and proposes humanization methods to improve their ability to operate undetected in adversarial digital environments.

AIBullisharXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

Researchers propose Trajectory Induced Preference Optimization (TIPO), a novel method for training mobile GUI agents to respect user privacy preferences while maintaining task execution capability. The approach addresses the challenge that privacy-conscious users generate structurally different execution patterns than utility-focused users, requiring specialized optimization techniques to properly align agent behavior with individual privacy preferences.

AIBullisharXiv โ€“ CS AI ยท Mar 36/1010
๐Ÿง 

Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

Researchers developed ST-Lite, a training-free KV cache compression framework that accelerates GUI agents by 2.45x while using only 10-20% of the cache budget. The solution addresses memory and latency constraints in Vision-Language Models for autonomous GUI interactions through specialized attention pattern optimization.

AINeutralHugging Face Blog ยท Jun 64/105
๐Ÿง 

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

ScreenSuite is introduced as a comprehensive evaluation suite specifically designed for GUI (Graphical User Interface) agents. The tool appears to provide testing and assessment capabilities for AI systems that interact with graphical interfaces.