🧠 AI⚪ NeutralImportance 6/10

ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

arXiv – CS AI|Haodi Hu, Chung-Ta Huang, Jing Liu, Ye Wang, Kei Suzuki, Matthew Brand, Toshiaki Koike-Akino|June 9, 2026 at 04:00 AM

🤖AI Summary

ReCoVLA introduces a framework that enhances vision-language-action (VLA) policies by using external vision-language models to identify failures and guide residual policy training for recovery. The approach freezes pretrained VLA policies and compiles structured rewards for correction, achieving 66.7% success in simulation and 61.7% in zero-shot real-world deployment compared to 36.7% for baseline methods.

Analysis

ReCoVLA addresses a critical limitation in current vision-language-action policies: their brittleness when encountering off-nominal or failure states during robotic manipulation tasks. Rather than retraining entire models or relying on direct VLM-generated actions, the framework strategically decouples high-level semantic understanding from low-level motor control. This architectural choice enables better generalization across different VLA architectures while maintaining computational efficiency through selective reward compilation.

The research builds on growing interest in hybrid approaches that combine large pretrained models with targeted fine-tuning. As robotics increasingly depends on foundation models for language understanding, the challenge shifts from initial task performance to robust failure recovery. ReCoVLA's use of external VLMs as semantic reward selectors rather than direct action generators represents a pragmatic middle ground, reducing the burden on language models while leveraging their strengths in contextual understanding.

The performance improvements are substantial: doubling success rates from baseline fine-tuning in simulation and maintaining competitive real-world performance without additional physical training demonstrates practical value. The zero-shot sim-to-real transfer particularly matters for robotics applications where real-world data collection is expensive. This approach could influence how teams develop robotic systems by emphasizing modular failure recovery strategies over monolithic policy learning, potentially reducing development timelines and data requirements for production deployments.

Key Takeaways

→ReCoVLA achieves 66.7% success in simulation compared to 36.7% baseline by using VLM-guided reward compilation for failure recovery
→The framework decouples semantic understanding from motor control by using vision-language models as reward selectors rather than action generators
→Zero-shot sim-to-real transfer achieves 61.7% success, enabling deployment without additional physical robot training
→The approach remains compatible with different VLA architectures by keeping pretrained policies frozen and training only residual recovery policies
→Modular failure recovery design reduces development complexity and data requirements compared to full policy retraining

#robotics #vision-language-models #reinforcement-learning #sim-to-real #manipulation #failure-recovery #vla-policies

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge