EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets
EgoAERO introduces a framework enabling robots to learn dexterous manipulation skills from single egocentric human videos without requiring pre-scanned object assets or CAD models. The system reconstructs hand-object trajectories and converts them into robot policies, supported by a new large-scale dataset (EgoDex-R) containing 4.3M RGB-D frames, achieving performance comparable to traditional asset-dependent methods.
EgoAERO represents a significant advancement in robot learning by eliminating the traditional dependency on object assets and CAD models, a major bottleneck in scaling dexterous manipulation training. The framework reconstructs contact-consistent trajectories through asset-free tracking and adaptive optimization, then applies residual learning to convert human demonstrations into executable robot policies. This approach is particularly significant because egocentric video represents an abundant source of manipulation data from human activities, yet remains largely untapped due to missing geometric and contact information.
The research addresses a fundamental challenge in imitation learning: bridging the sim-to-real gap while working with incomplete visual information. By introducing ego motion compensation and contact optimization without pre-defined object models, EgoAERO makes learning from human demonstrations substantially more practical. The introduction of EgoDex-R, containing 4.3M frames of egocentric manipulation data, provides essential infrastructure for advancing the field beyond single-demonstration capabilities.
For robotics developers and AI researchers, this work reduces the friction in creating training datasets and significantly lowers the barrier to entry for building dexterous manipulation systems. The achievement of near-CAD-baseline performance using only visual information validates the approach's robustness. Real-world experimental validation strengthens confidence in practical applicability, suggesting this framework could accelerate development cycles for industrial and service robotics applications requiring fine motor control.
- βEgoAERO enables dexterous robot learning from single egocentric videos without requiring object CAD models or pre-scanned assets.
- βThe framework reconstructs hand-object contact trajectories through asset-free tracking and adaptive optimization before converting to robot policies.
- βEgoDex-R dataset provides 4.3M RGB-D frames of egocentric manipulation data for large-scale dexterous policy learning.
- βPerformance on HOI4D approaches CAD-based reconstruction methods, validating the asset-free approach.
- βOnline quality assessment mechanisms enable automated filtering and evaluation of demonstration quality.