Web Agents Should Use Typed Actions Instead of Click-Based Browsing
A research paper proposes replacing click-based web automation with typed actions backed by semantic APIs, arguing this shift would make AI agents more reliable, auditable, and cost-effective. The authors introduce 'web verbs' as a standardized interface for web operations that could improve agent behavior and enable trustworthy automation at scale.
The paper addresses a fundamental architectural problem in current web automation: AI agents built on low-level primitives like clicks and DOM manipulation produce fragile systems prone to failure and difficult to audit. This matters because web agents are increasingly deployed for autonomous task execution across e-commerce, data retrieval, and process automation—domains where reliability directly impacts business operations and user trust.
The current approach treats web interaction as raw input-output, forcing agents to learn brittle pixel-level patterns that break when interfaces change. The proposed alternative creates a semantic abstraction layer where web operations expose structured inputs, outputs, and documented behavior. This mirrors successful patterns in software engineering: APIs abstract away implementation details, enabling composition and reasoning.
For the AI and automation industry, this shift has meaningful implications. Enterprises deploying agentic systems face hidden costs from failure recovery and forensic debugging when agents behave unpredictably. A typed action layer with preconditions, postconditions, and logging would dramatically reduce these hidden costs while enabling compliance verification—critical for regulated industries. Developers would benefit from clearer composition models and testability.
The proposal requires significant coordination: web platforms would need to expose verb interfaces, frameworks must support verb-based agents, and communities must standardize definitions. While arXiv papers don't immediately move markets, this represents mature thinking about agentic infrastructure. The tension between current fragile agents and reliable systems creates demand for this layer. Success depends on adoption by major platforms and tool vendors willing to invest in semantic abstraction layers.
- →Current click-based web agents are brittle, expensive to maintain, and produce unpredictable behavior due to reliance on low-level DOM manipulation primitives.
- →Typed web verbs—structured function interfaces with defined inputs, outputs, and behavior—could replace click-based automation and improve reliability and auditability.
- →This architectural shift parallels successful API design patterns and would enable agents to synthesize explicit control flows with checkable execution traces.
- →Adoption requires ecosystem coordination: web platforms exposing verb interfaces, agent frameworks supporting typed actions, and standardization across the community.
- →The semantic abstraction layer directly addresses hidden costs in enterprise automation systems, particularly in regulated industries requiring compliance verification.