Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
Pro²Assist is a step-aware AI assistant that uses augmented reality glasses and multimodal perception to provide real-time, proactive guidance for multi-step procedural tasks. The system tracks user progress continuously and demonstrates 21% higher accuracy in action understanding and 2.29x better timing accuracy compared to existing baselines, with 90% user approval in testing.
Pro²Assist represents a meaningful advancement in how AI assistants can support users during complex, real-world procedural tasks. Rather than waiting for explicit user queries, the system actively monitors egocentric video from AR glasses to understand task progress and anticipate assistance needs before problems arise. This shift from reactive to proactive support addresses a genuine gap in current AI applications, where most systems remain passive until directly engaged.
The technical achievement centers on multi-modal perception and temporal reasoning. By combining visual input from AR glasses with procedural knowledge bases, Pro²Assist can maintain awareness of which procedural step a user occupies and predict when guidance will be most helpful. The reported 21% improvement in action understanding and 2.29x improvement in timing accuracy suggest the system successfully solves the technical challenges of continuous monitoring and contextual inference across long-horizon tasks.
For the AR and AI assistance market, this work validates the potential of egocentric AR systems as platforms for intelligent support. Rather than information delivered to a desk, assistance becomes available continuously during physical activities. This expands addressable use cases beyond entertainment or communication into practical domains like manufacturing, cooking, medical procedures, or maintenance.
The user study showing 90% perceived usefulness indicates the approach resonates with real users, not just academic metrics. Future development will likely focus on scaling beyond laboratory environments, integrating with commercial AR platforms like Apple Vision Pro or Microsoft HoloLens, and extending to domain-specific applications where procedural guidance has highest value.
- →Pro²Assist achieves 21% higher accuracy in procedural action understanding compared to baseline systems through continuous multimodal monitoring.
- →The system demonstrates 2.29x better timing accuracy in delivering proactive assistance by reasoning over user state and task progress.
- →Real-world testing on 20 users showed 90% found the proactive assistance approach useful for procedural task support.
- →The approach leverages egocentric AR glasses as the primary sensing platform, enabling continuous task monitoring without explicit user queries.
- →Multi-scale temporal dynamics and task-specific knowledge extraction enable the system to understand fine-grained procedural context across long-horizon activities.