Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval
Researchers introduce State-Grounded Dynamic Retrieval (SGDR), a new method enabling language agents to dynamically reuse learned skills during web automation tasks. By matching skills to both task goals and current webpage states rather than fixed skill sets, SGDR achieves 10.6% relative performance gains over existing approaches on complex multi-step web tasks.
SGDR addresses a fundamental limitation in current web automation systems: the mismatch between static skill retrieval and dynamic execution environments. Previous approaches treat skill selection as a one-time decision based on initial task instructions, failing to adapt when webpage states evolve unexpectedly. This research demonstrates that intermediate state awareness significantly improves agent performance, suggesting a path toward more robust autonomous systems.
The technical innovation lies in three components working together: sliding-window extraction converts task trajectories into reusable procedures, dual text-code representation enables precise skill matching, and state-grounded retrieval connects skills to both goals and current webpage conditions. These mechanisms reflect lessons learned from how humans perform complex tasks—they constantly reassess tools and strategies based on changing conditions rather than rigidly following initial plans.
The experimental results on WebArena across five domains validate the approach's effectiveness. Achieving 37.5% success rates with GPT-4.1 represents meaningful progress in web automation, where task complexity spans e-commerce, content management, and other domains. The 10% improvement margin over strong baselines suggests the dynamic retrieval strategy captures important patterns that static methods miss.
Looking forward, this work opens questions about skill transferability across domains and the scalability of dynamic retrieval as skill repositories grow. The open-source release accelerates community adoption and refinement. As language agents increasingly handle real-world automation tasks, methods that adapt to changing execution contexts will become essential infrastructure.
- →SGDR enables web agents to dynamically select skills during execution based on current webpage state, not just initial task instructions
- →The method achieves 10.6% relative performance improvement over baselines using GPT-4.1 on complex web automation benchmarks
- →Dual text-code representation connects natural language skill descriptions with executable actions for precise retrieval
- →State-grounded dynamic retrieval addresses the fundamental mismatch between static skill selection and dynamic webpage environments
- →Results across five WebArena domains demonstrate consistent improvements, validating the approach's generalization capability