9 articles tagged with #web-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv โ CS AI ยท Apr 67/10
๐ง Researchers have discovered a new attack called eTAMP that can poison AI web agents' memory through environmental observation alone, achieving cross-session compromise rates up to 32.5%. The vulnerability affects major models including GPT-5-mini and becomes significantly worse when agents are under stress, highlighting critical security risks as AI browsers gain adoption.
๐ข Perplexity๐ง GPT-5๐ง ChatGPT
AIBearisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers have released MalURLBench, the first benchmark to evaluate how LLM-based web agents handle malicious URLs, revealing significant vulnerabilities across 12 popular models. The study found that existing AI agents struggle to detect disguised malicious URLs and proposed URLGuard as a defensive solution.
AIBullisharXiv โ CS AI ยท Mar 67/10
๐ง WebFactory introduces a fully automated reinforcement learning pipeline that efficiently transforms large language models into GUI agents without requiring unsafe live web interactions or costly human-annotated data. The system demonstrates exceptional data efficiency by achieving comparable performance to human-trained agents while using synthetic data from only 10 websites.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed DMAST, a new training framework that protects multimodal web agents from cross-modal attacks where adversaries inject malicious content into webpages to deceive both visual and text processing channels. The method uses adversarial training through a three-stage pipeline and significantly outperforms existing defenses while doubling task completion efficiency.
AINeutralarXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose a hierarchical planning framework to analyze why LLM-based web agents fail at complex navigation tasks. The study reveals that while structured PDDL plans outperform natural language plans, low-level execution and perceptual grounding remain the primary bottlenecks rather than high-level reasoning.
AIBullisharXiv โ CS AI ยท Mar 166/10
๐ง Researchers introduce a formal planning framework that maps LLM-based web agents to traditional search algorithms, enabling better diagnosis of failures in autonomous web tasks. The study compares different agent architectures using novel evaluation metrics and a dataset of 794 human-labeled trajectories from WebArena benchmark.
AIBearisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers introduced the Synthetic Web Benchmark, revealing that frontier AI language models fail catastrophically when exposed to high-plausibility misinformation in search results. The study shows current AI agents struggle to handle conflicting information sources, with accuracy collapsing despite access to truthful content.
AIBullisharXiv โ CS AI ยท Mar 36/1010
๐ง Researchers have released DeepResearch-9K, a large-scale dataset with 9,000 questions across three difficulty levels designed to train and benchmark AI research agents. The accompanying open-source framework DeepResearch-R1 supports multi-turn web interactions and reinforcement learning approaches for developing more sophisticated AI research capabilities.
AIBearisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers identified widespread TOCTOU (time of check to time of use) vulnerabilities in browser-use agents, where web pages change between planning and execution phases, potentially causing unintended actions. A study of 10 popular open-source agents revealed these security flaws are common, prompting development of a lightweight mitigation strategy based on pre-execution validation.