y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#web-agents News & Analysis

9 articles tagged with #web-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

9 articles
AIBearisharXiv โ€“ CS AI ยท Apr 67/10
๐Ÿง 

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

Researchers have discovered a new attack called eTAMP that can poison AI web agents' memory through environmental observation alone, achieving cross-session compromise rates up to 32.5%. The vulnerability affects major models including GPT-5-mini and becomes significantly worse when agents are under stress, highlighting critical security risks as AI browsers gain adoption.

๐Ÿข Perplexity๐Ÿง  GPT-5๐Ÿง  ChatGPT
AIBearisharXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Researchers have released MalURLBench, the first benchmark to evaluate how LLM-based web agents handle malicious URLs, revealing significant vulnerabilities across 12 popular models. The study found that existing AI agents struggle to detect disguised malicious URLs and proposed URLGuard as a defensive solution.

AIBullisharXiv โ€“ CS AI ยท Mar 67/10
๐Ÿง 

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

WebFactory introduces a fully automated reinforcement learning pipeline that efficiently transforms large language models into GUI agents without requiring unsafe live web interactions or costly human-annotated data. The system demonstrates exceptional data efficiency by achieving comparable performance to human-trained agents while using synthetic data from only 10 websites.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Researchers developed DMAST, a new training framework that protects multimodal web agents from cross-modal attacks where adversaries inject malicious content into webpages to deceive both visual and text processing channels. The method uses adversarial training through a three-stage pipeline and significantly outperforms existing defenses while doubling task completion efficiency.

AINeutralarXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Researchers propose a hierarchical planning framework to analyze why LLM-based web agents fail at complex navigation tasks. The study reveals that while structured PDDL plans outperform natural language plans, low-level execution and perceptual grounding remain the primary bottlenecks rather than high-level reasoning.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

AI Planning Framework for LLM-Based Web Agents

Researchers introduce a formal planning framework that maps LLM-based web agents to traditional search algorithms, enabling better diagnosis of failures in autonomous web tasks. The study compares different agent architectures using novel evaluation metrics and a dataset of 794 human-labeled trajectories from WebArena benchmark.

AIBullisharXiv โ€“ CS AI ยท Mar 36/1010
๐Ÿง 

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent

Researchers have released DeepResearch-9K, a large-scale dataset with 9,000 questions across three difficulty levels designed to train and benchmark AI research agents. The accompanying open-source framework DeepResearch-R1 supports multi-turn web interactions and reinforcement learning approaches for developing more sophisticated AI research capabilities.

AIBearisharXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents

Researchers identified widespread TOCTOU (time of check to time of use) vulnerabilities in browser-use agents, where web pages change between planning and execution phases, potentially causing unintended actions. A study of 10 popular open-source agents revealed these security flaws are common, prompting development of a lightweight mitigation strategy based on pre-execution validation.