🧠 AI⚪ NeutralImportance 6/10

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

arXiv – CS AI|Mohamed Aghzal, Gregory J. Stein, Ziyu Yao|March 17, 2026 at 04:00 AM

🤖AI Summary

Researchers propose a hierarchical planning framework to analyze why LLM-based web agents fail at complex navigation tasks. The study reveals that while structured PDDL plans outperform natural language plans, low-level execution and perceptual grounding remain the primary bottlenecks rather than high-level reasoning.

Key Takeaways

→LLM web agents still fall far short of human reliability on realistic, long-horizon web navigation tasks.
→The proposed hierarchical framework evaluates agents across three layers: high-level planning, low-level execution, and replanning.
→Structured PDDL plans produce more concise and goal-directed strategies compared to natural language plans.
→Low-level execution remains the dominant bottleneck, not high-level reasoning capabilities.
→Improving perceptual grounding and adaptive control is critical for achieving human-level agent reliability.