AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers introduce JAMEL, a framework that trains AI agents to explore open-ended environments more effectively by jointly developing memory systems and exploration policies through novelty-driven learning. The approach uses natural supervisory signals like code coverage to train compressed memory representations, achieving exploration capabilities that rival closed-source models while reducing computational token consumption.
AIBullisharXiv – CS AI · May 287/10
🧠MobileGym is a new browser-based simulation platform designed to accelerate mobile GUI agent research by enabling verifiable outcomes and scalable parallel training. The platform supports 416 parameterized tasks across 28 apps and demonstrates strong sim-to-real transfer, with a trained model retaining 95.1% of simulation gains on real devices.
AIBullisharXiv – CS AI · May 47/10
🧠Researchers introduce A11y-Compressor, a framework that optimizes how AI agents interpret graphical user interfaces by transforming accessibility trees into more efficient representations. The approach reduces input tokens by 78% while simultaneously improving task success rates by 5.1 percentage points, addressing a critical bottleneck in GUI automation systems.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers propose MGA (Memory-Driven GUI Agent), a minimalist AI framework that improves GUI automation by decoupling long-horizon tasks into independent steps linked through structured state memory. The approach addresses critical limitations in current multimodal AI agents—context overload and architectural redundancy—while maintaining competitive performance with reduced complexity.
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers developed CES, a multi-agent framework using reinforcement learning to improve GUI automation for long-horizon tasks. The system uses a Coordinator for planning, State Tracker for context management, and can integrate with any low-level Executor model to significantly enhance performance on complex automated tasks.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduced HLL (Humanity's Last Line of Verification), a benchmark testing whether multimodal AI agents can bypass CAPTCHA protections designed to verify human users. Testing eight frontier models revealed significant brittleness: agent performance varies sharply across CAPTCHA types, degrades under realistic conditions, and fails when solutions must be supported by valid action traces, exposing gaps in localization, action calibration, and process consistency.
AIBullisharXiv – CS AI · 2d ago6/10
🧠Researchers introduce STaR-KV, a training-free compression framework that reduces key-value cache memory consumption in vision-language GUI agents by up to 40% while maintaining accuracy. The method addresses a critical bottleneck where models like UI-TARS-1.5-7B consume prohibitive GPU memory during multi-step interactions, enabling more practical deployment on standard accelerators.
AIBullisharXiv – CS AI · 6d ago6/10
🧠Researchers introduce UI-KOBE, a framework that enhances lightweight mobile GUI agents by combining them with app-specific knowledge graphs to enable more reliable task automation on mobile devices. This approach reduces dependency on large vision-language models, lowering inference costs and improving privacy by enabling on-device deployment without sacrificing performance.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers developed and evaluated mobile world models across four modalities (delta text, full text, diffusion images, and renderable code) to guide GUI agents in executing smartphone tasks. The study reveals that renderable code provides the best in-distribution fidelity while text-based models are more robust for out-of-distribution execution, and that world-model-generated trajectories can improve agent training despite not preserving original data distributions.
AIBullisharXiv – CS AI · May 116/10
🧠AgentProg introduces a novel program-guided context management system for long-horizon GUI agents that addresses the critical bottleneck of expanding interaction history overhead. By reframing interaction history as structured programs with variables and control flow, the approach preserves semantic information while reducing context requirements, achieving state-of-the-art performance on AndroidWorld benchmarks while maintaining robustness on extended tasks.
AIBullisharXiv – CS AI · Mar 166/10
🧠Researchers introduce CRAFT-GUI, a curriculum learning framework that uses reinforcement learning to improve AI agents' performance in graphical user interface tasks. The method addresses difficulty variation across GUI tasks and provides more nuanced feedback, achieving 5.6% improvement on Android Control benchmarks and 10.3% on internal benchmarks.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers have developed State-aware Reasoning (StaR), a new multimodal AI method that significantly improves AI agents' ability to interact with graphical user interfaces, particularly with toggle controls. The method enables agents to better perceive current states and execute instructions accordingly, improving toggle execution accuracy by over 30%.
AIBullishHugging Face Blog · Sep 236/106
🧠Smol2Operator introduces post-training GUI agents designed for computer use applications. The development represents advancement in AI agents capable of interacting with graphical user interfaces autonomously.
AIBullishHugging Face Blog · Jun 36/107
🧠Holo1 represents a new family of Vision-Language Models (VLMs) specifically designed for GUI automation, powering the GUI agent Surfer-H. This development advances AI's ability to interact with graphical user interfaces autonomously.