y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

arXiv – CS AI|Yixiang Zhu, Yonghao Chen, Zijie Yang, Yusong Hu, Xinyu Chen|
🤖AI Summary

Researchers introduce DEFLECT, an offline post-training framework that improves Vision-Language-Action (VLA) robot policies by addressing latency-induced misalignment in asynchronous inference. The method uses counterfactual preference learning to teach policies to favor execution-time-aligned actions over stale prediction-time actions, achieving up to 6.4 percentage-point improvements in high-latency success rates without requiring human labels, reward models, or architectural changes.

Analysis

DEFLECT addresses a fundamental challenge in deploying large vision-language models for robotic control: the temporal mismatch between when actions are computed and when they execute. As robots increasingly use asynchronous inference to mask model latency, predictions based on stale observations become misaligned with evolved environmental states during execution. This creates a significant robustness problem that existing approaches inadequately resolve.

The framework's innovation lies in converting this latency problem into a training signal through counterfactual preference learning. Rather than relying on human annotations or expensive online robot rollouts, DEFLECT uses a frozen reference VLA to generate preferred actions from future observations and rejected actions from stale observations. The trainable policy then learns to distinguish these under deployment-time conditions, effectively teaching delay robustness without external supervision. This approach bridges the gap between training and deployment dynamics that typically plague offline learning.

For the robotics and embodied AI communities, DEFLECT represents a practical solution to a previously thorny deployment problem. The method's independence from human preferences, reward models, or architectural modifications makes it immediately applicable to existing VLA systems. Demonstrated improvements across diverse benchmarks—Kinetix, LIBERO, and real-robot tasks—suggest the approach generalizes beyond specific environments.

The work's significance extends to broader questions about deploying foundation models in real-time systems. As VLAs become larger and slower, latency-robustness techniques become increasingly critical. DEFLECT establishes a template for offline adaptation methods that address deployment-time distribution shifts without compromising model integrity or requiring online experimentation.

Key Takeaways
  • DEFLECT uses counterfactual preference learning to teach VLAs to handle inference latency without human annotations or additional online rollouts
  • The method converts temporal mismatch between prediction and execution into supervised training signals using a reference model
  • Real-robot experiments show up to 6.4 percentage-point improvements in success rates under high-latency conditions
  • The approach requires no architectural changes or inference-time computation overhead, enabling easy integration into existing systems
  • Framework addresses a critical deployment challenge for large vision-language models in time-sensitive robotic applications
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles