Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance
Researchers introduce EgoProactive, a large-scale egocentric dataset and unified benchmark (Pro²Bench) for training AI systems to provide real-time procedural guidance while detecting and recovering from user deviations. The proposed decoupled planner-interaction architecture outperforms proprietary AI models (GPT, Claude, Gemini) on intervention quality and off-plan recovery tasks across six diverse datasets.
This research addresses a critical gap in AI assistant capabilities: the ability to provide timely, contextual guidance during real-world procedural tasks while adapting when users deviate from expected sequences. The work moves beyond passive AI systems toward proactive ones that autonomously decide when and how to intervene, a fundamental shift in human-computer interaction design.
The technical contribution stems from recognizing that existing benchmarks fail to capture realistic conditions where users naturally diverge from prescribed steps—a common scenario in manufacturing, medical procedures, cooking, and assembly tasks. By releasing EgoProactive with explicit Out-of-Plan annotations and augmenting five established benchmarks into a unified Pro²Bench schema, the researchers create infrastructure for training more robust systems. The decoupled architecture separating procedural state modeling from interaction guidance represents thoughtful system design that balances computational efficiency with practical performance.
The cross-model validation across Llama-4 and Qwen-3.6-VL demonstrates generalizability, suggesting the post-training recipe transfers meaningfully across different foundation models. Performance improvements over Claude Opus, Gemini Pro, and GPT indicate the specialized architecture and training approach outperform general-purpose models at this specific task class.
For the AI industry, this work establishes benchmarks and architectural patterns for a new category of assistive AI systems. Enterprise applications in manufacturing, healthcare, and technical support could adopt these approaches for autonomous coaching systems. The emphasis on recovery pathways—how to guide users back to optimal procedures—reflects maturation toward practical, fault-tolerant AI deployment rather than idealized problem-solving scenarios.
- →EgoProactive dataset introduces explicit Out-of-Plan annotations, enabling training for real-world procedural guidance where users naturally deviate from expected steps
- →Pro²Bench unifies five major egocentric video datasets under a proactive-guidance schema, creating comprehensive infrastructure for this emerging AI capability
- →Specialized decoupled planner-interaction architecture outperforms proprietary AI models (GPT-5.2, Claude Opus, Gemini-3.1) on intervention quality metrics
- →Post-training recipe demonstrates cross-backbone transferability, suggesting practical deployment across different foundation models and organizations
- →Recovery-focused design addresses realistic failure modes, improving systems' ability to guide users back to optimal procedures after deviations