y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#computer-agents News & Analysis

4 articles tagged with #computer-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AINeutralarXiv โ€“ CS AI ยท Mar 277/10
๐Ÿง 

WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing

Researchers introduced WebTestBench, a new benchmark for evaluating automated web testing using AI agents and large language models. The study reveals significant gaps between current AI capabilities and industrial deployment needs, with LLMs struggling with test completeness, defect detection, and long-term interaction reliability.

AIBullisharXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents

Researchers have developed Declarative Model Interface (DMI), a new abstraction layer that transforms traditional GUIs into LLM-friendly interfaces for computer-use agents. Testing with Microsoft Office Suite showed 67% improvement in task success rates and 43.5% reduction in interaction steps, with over 61% of tasks completed in a single LLM call.

AIBearisharXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents

Researchers have identified critical security vulnerabilities in Computer-Use Agents (CUAs) through Visual Prompt Injection attacks, where malicious instructions are embedded in user interfaces. Their VPI-Bench study shows CUAs can be deceived at rates up to 51% and Browser-Use Agents up to 100% on certain platforms, with current defenses proving inadequate.

AIBullishOpenAI News ยท Jun 237/105
๐Ÿง 

Learning to play Minecraft with Video PreTraining

Researchers developed a neural network that learned to play Minecraft using Video PreTraining (VPT) on massive unlabeled human gameplay footage with minimal labeled data. The AI can craft diamond tools through standard keyboard and mouse inputs, representing progress toward general-purpose computer-using agents.