From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation
Researchers present a generative framework that converts real-world panoramic images into high-fidelity simulation scenes for robot training, using semantic and geometric editing to create diverse training variants. The approach demonstrates strong sim-to-real correlation and enables robots to generalize better to unseen environments and objects through scaled synthetic data generation.
This research addresses a critical bottleneck in robotics: the expensive and time-consuming process of collecting diverse real-world training data for robust robot policies. Traditional approaches require physically reconfiguring environments and acquiring multiple assets, making large-scale data collection prohibitively costly. The proposed generative framework bypasses these limitations by automatically converting real-world panoramas into photorealistic simulation environments, then algorithmically generating variations—termed 'Digital Cousins'—through semantic and geometric modifications. This approach bridges the simulation-to-reality gap, a persistent challenge in robotics where models trained in simulation often fail in the real world due to visual and physical mismatches.
The framework's integration of multi-room stitching for long-horizon navigation tasks demonstrates practical applicability beyond simple manipulation scenarios. By establishing reproducible, diverse training environments from minimal real-world input, the research enables exponential scaling of training data generation at marginal cost. The empirical validation of sim-to-real correlation is particularly significant, suggesting that high-fidelity synthetic data can effectively substitute for some real-world collection efforts.
For the robotics and AI industries, this represents a meaningful step toward cost-effective robot training at scale. Companies developing autonomous systems stand to benefit from accelerated development cycles and reduced infrastructure costs. The work influences broader trends in synthetic data generation, where AI-generated training environments are becoming competitive with real-world collection for certain applications. As robotics applications expand into manufacturing, logistics, and service sectors, efficient training methodologies become increasingly valuable, potentially accelerating commercialization timelines and reducing deployment costs for robotic systems.
- →Generative framework converts real panoramas into diverse high-fidelity simulation scenes for robot training without expensive physical reconfiguration
- →Digital Cousins approach enables algorithmic generation of scene variations for scaled synthetic data production at minimal marginal cost
- →Strong sim-to-real correlation demonstrated, validating synthetic training data's effectiveness for real-world robot deployment
- →Multi-room stitching capability extends framework beyond manipulation tasks to complex long-horizon navigation scenarios
- →Results show significantly improved generalization to unseen environments and objects through scaled synthetic data augmentation