AIBullisharXiv – CS AI · 3h ago7/10
🧠
Turning Video Models into Generalist Robot Policies
Researchers present VERA, a decoupled approach to robot control that separates video prediction from action execution using inverse dynamics models. Rather than fine-tuning video models with action labels, the method keeps the video planner unchanged and trains embodiment-specific models to translate predicted frames into robot actions, enabling zero-shot cross-embodiment generalization.