🧠 AI🟢 BullishImportance 6/10

Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

arXiv – CS AI|Bowen Zhou, Zhou Xu, Wanli Li, Jingyu Xiao, Haoqian Wang|March 3, 2026 at 05:00 AM|10 views

🤖AI Summary

Researchers developed ST-Lite, a training-free KV cache compression framework that accelerates GUI agents by 2.45x while using only 10-20% of the cache budget. The solution addresses memory and latency constraints in Vision-Language Models for autonomous GUI interactions through specialized attention pattern optimization.

Key Takeaways

→ST-Lite achieves 2.45x decoding acceleration for GUI agents while maintaining comparable performance to full-cache systems.
→The framework uses only 10-20% of the typical cache budget, significantly reducing memory footprint for VLMs.
→GUI attention patterns exhibit uniform high-sparsity across all transformer layers, unlike general visual tasks.
→The solution introduces Component-centric Spatial Saliency and Trajectory-aware Semantic Gating for optimization.
→This training-free approach offers a scalable solution for resource-constrained autonomous GUI agents.