y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#web-automation News & Analysis

7 articles tagged with #web-automation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AIBullisharXiv – CS AI · May 17/10
🧠

Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation

Researchers propose a Compile-and-Execute architecture that reduces LLM-driven web automation costs from $150 to under $0.10 per workflow by decoupling reasoning from execution. Instead of continuous inference loops, a single LLM call generates a deterministic JSON blueprint that a lightweight runtime executes without additional model queries, achieving 80-94% zero-shot success rates.

AINeutralarXiv – CS AI · Mar 56/10
🧠

Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

Researchers introduced WebRRSBench, a comprehensive benchmark evaluating multimodal large language models' reasoning, robustness, and safety capabilities for web understanding tasks. Testing 11 MLLMs on 3,799 QA pairs from 729 websites revealed significant gaps in compositional reasoning, UI robustness, and safety-critical action recognition.

AINeutralarXiv – CS AI · Mar 56/10
🧠

WebDS: An End-to-End Benchmark for Web-based Data Science

Researchers introduce WebDS, a new benchmark for evaluating AI agents on real-world web-based data science tasks across 870 scenarios and 29 websites. Current state-of-the-art LLM agents achieve only 15% success rates compared to 90% human accuracy, revealing significant gaps in AI capabilities for complex data workflows.

AINeutralarXiv – CS AI · May 296/10
🧠

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Researchers introduce PlanAhead, a framework that systematically evaluates how different natural language plan representations affect LLM-based web agent performance across multiple AI models. The study finds that both the plan formulation method and underlying LLM significantly impact agent robustness, with implications for improving autonomous AI systems that interact with web interfaces.

🏢 OpenAI
AIBullisharXiv – CS AI · Apr 146/10
🧠

Tuning Qwen2.5-VL to Improve Its Web Interaction Skills

Researchers fine-tuned Qwen2.5-VL-32B, a leading open-source vision-language model, to improve its ability to autonomously perform web interactions through visual input alone. Using a two-stage training approach that addresses cursor localization, instruction sensitivity, and overconfidence bias, the model's success rate on single-click web tasks improved from 86% to 94%.

AIBullisharXiv – CS AI · Mar 66/10
🧠

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Researchers propose STRUCTUREDAGENT, a new AI framework that uses hierarchical planning with AND/OR trees to improve web agent performance on complex, long-horizon tasks. The system addresses limitations in current LLM-based agents through better memory tracking and structured planning approaches.

AINeutralarXiv – CS AI · Mar 54/10
🧠

On the Suitability of LLM-Driven Agents for Dark Pattern Audits

Researchers evaluated LLM-driven agents' ability to identify dark patterns in web interfaces, specifically testing on 456 data broker websites processing CCPA data rights requests. The study examined whether AI agents can reliably detect manipulative design elements that discourage users from exercising their privacy rights.