🧠 AI🟢 BullishImportance 7/10

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

arXiv – CS AI|Weijia Liufu, Xiaoyu Guo, Ruiyi Chen, Jingzhi Liu, Kaidong Zhang, Xiwen Liang, Jianqi Lin, Dawei Sun, Yuze Wang, Rongtao Xu, Bingqian Lin, Bowen Yang, Tongtong Cao, Bowen Peng, Dongyu Zhang, Guangrun Wang, Min Wang, Liang Lin, Xiaodan Liang|May 12, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce RePO-VLA, a policy optimization framework that improves Vision-Language-Action models' ability to recover from failures in complex manipulation tasks. The method increases adversarial robustness from 20% to 75% by learning from recovery trajectories rather than discarding failed attempts, with validation on both simulated and real-world robotic tasks.

Analysis

RePO-VLA addresses a fundamental challenge in robotic learning: current VLA models rely heavily on success-only imitation, which provides minimal guidance when execution deviates from expected trajectories. This brittleness severely limits deployment in contact-rich, long-horizon tasks where perturbations are inevitable. The framework reframes failure data as valuable training signal by decomposing trajectories into three distinct categories—successes, recoveries, and failures—each serving a specific pedagogical purpose.

The technical innovation centers on three components working in concert. Recovery-Aware Initialization decouples corrective actions from preceding failures by resetting history context, ensuring the model learns state-specific recovery rather than memorizing failure sequences. The Progress-Aware Semantic Value Function creates nuanced labels that distinguish nominal actions from corrective ones while identifying unrecoverable drift. Value-Conditioned Refinement then biases the policy toward high-progress actions during training and deployment using a fixed value signal, eliminating the need for failure detection heuristics.

For the robotics and embodied AI community, these results represent meaningful progress toward more reliable autonomous systems. Raising adversarial success rates from 20% to 75% substantially narrows the robustness gap that prevents real-world deployment. The introduction of FRBench—a standardized benchmark with error injection and recovery-focused metrics—provides essential infrastructure for comparing future approaches. The real-world validation on bimanual tasks demonstrates the framework transcends simulation-only claims.

Looking forward, the field will likely explore whether recovery-driven optimization generalizes to other embodied AI domains beyond manipulation and whether human-in-the-loop recovery collection can be further automated.

Key Takeaways

→RePO-VLA treats failure trajectories as learning opportunities rather than discarding them, fundamentally changing how VLA models process diverse execution outcomes.
→The framework achieves 75% adversarial robustness on average, up from 20%, through recovery-aware initialization and value-conditioned refinement.
→FRBench introduces standardized error injection and recovery-focused evaluation metrics, establishing infrastructure for reproducible robustness benchmarking.
→The approach requires no online failure detection or heuristic retries at deployment, using a fixed value signal to bias toward learned success manifolds.
→Real-world validation on bimanual robotic tasks demonstrates the method extends beyond simulation, advancing practical embodied AI deployment.