AINeutralarXiv – CS AI · 9h ago6/10
🧠
DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs
Researchers introduce DEFLECT, an offline post-training framework that improves Vision-Language-Action (VLA) robot policies by addressing latency-induced misalignment in asynchronous inference. The method uses counterfactual preference learning to teach policies to favor execution-time-aligned actions over stale prediction-time actions, achieving up to 6.4 percentage-point improvements in high-latency success rates without requiring human labels, reward models, or architectural changes.