y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#web-automation News & Analysis

5 articles tagged with #web-automation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AINeutralarXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

Researchers introduced WebRRSBench, a comprehensive benchmark evaluating multimodal large language models' reasoning, robustness, and safety capabilities for web understanding tasks. Testing 11 MLLMs on 3,799 QA pairs from 729 websites revealed significant gaps in compositional reasoning, UI robustness, and safety-critical action recognition.

AINeutralarXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

WebDS: An End-to-End Benchmark for Web-based Data Science

Researchers introduce WebDS, a new benchmark for evaluating AI agents on real-world web-based data science tasks across 870 scenarios and 29 websites. Current state-of-the-art LLM agents achieve only 15% success rates compared to 90% human accuracy, revealing significant gaps in AI capabilities for complex data workflows.

AIBullisharXiv โ€“ CS AI ยท 6d ago6/10
๐Ÿง 

Tuning Qwen2.5-VL to Improve Its Web Interaction Skills

Researchers fine-tuned Qwen2.5-VL-32B, a leading open-source vision-language model, to improve its ability to autonomously perform web interactions through visual input alone. Using a two-stage training approach that addresses cursor localization, instruction sensitivity, and overconfidence bias, the model's success rate on single-click web tasks improved from 86% to 94%.

AIBullisharXiv โ€“ CS AI ยท Mar 66/10
๐Ÿง 

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Researchers propose STRUCTUREDAGENT, a new AI framework that uses hierarchical planning with AND/OR trees to improve web agent performance on complex, long-horizon tasks. The system addresses limitations in current LLM-based agents through better memory tracking and structured planning approaches.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

On the Suitability of LLM-Driven Agents for Dark Pattern Audits

Researchers evaluated LLM-driven agents' ability to identify dark patterns in web interfaces, specifically testing on 456 data broker websites processing CCPA data rights requests. The study examined whether AI agents can reliably detect manipulative design elements that discourage users from exercising their privacy rights.