World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
Researchers introduce World Action Verifier (WAV), a framework that enables world models to self-correct prediction errors by decomposing action-conditioned predictions into verifiable components: state plausibility and action reachability. The approach achieves 2x higher sample efficiency and 22% policy performance improvements across robotic control tasks by leveraging asymmetries in data availability and feature dimensionality.
World models represent a critical frontier in AI systems that must predict environment dynamics for planning and control. The fundamental challenge WAV addresses is that typical world models train on demonstration data biased toward optimal actions, leaving them unreliable when evaluating suboptimal or exploratory action sequences—a critical gap for real-world robotics where models must handle unexpected behaviors. This research tackles the problem through an elegant decomposition strategy that shifts from direct forward prediction to verifiable factors, leveraging two key asymmetries: abundant unlabeled video data for state validation and lower-dimensional action-relevant features for inverse modeling. The framework combines a subgoal generator trained on video corpora with sparse inverse models to create a cycle-consistency verification mechanism, enabling self-improvement in underexplored action spaces. The empirical results across MiniGrid, RoboMimic, and ManiSkill demonstrate substantial gains in both sample efficiency and downstream policy performance, suggesting the approach scales across diverse robotic domains. This work matters because sample efficiency directly translates to reduced training costs and faster deployment timelines for robotics systems. The research advances toward more generalizable, robust world models that can operate beyond their training distributions—a prerequisite for autonomous systems operating in real-world environments where unexpected situations are inevitable. The insights about data asymmetries could inform broader machine learning practices where perfect supervision is unavailable.
- →WAV decomposes world model predictions into verifiable state plausibility and action reachability components
- →Leveraging action-free video data and sparse inverse models achieves 2x sample efficiency improvements
- →Cycle consistency verification enables self-improvement in underexplored action regimes
- →22% downstream policy performance gains demonstrated across nine robotic control tasks
- →Framework addresses critical gap where world models fail on suboptimal actions underrepresented in training data