Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs
Researchers evaluated Vision-Language-Action models in autonomous driving under sensor degradation, finding that explanation consistency (Chain-of-Causation) strongly correlates with trajectory reliability. When model explanations change due to perturbations like fog or noise, trajectory errors increase 5.3x, suggesting reasoning consistency could serve as a safety monitoring tool for autonomous vehicles.
This research addresses a critical vulnerability in autonomous driving systems: the fragility of reasoning under real-world sensor degradation. While Vision-Language-Action models have shown promise in interpretable autonomous driving, their reliance on clean sensor inputs remains largely unexamined. The study systematically perturbated sensors across 1,996 scenarios—introducing Gaussian noise, extreme lighting, and fog conditions—to assess how explanations and trajectories degrade.
The correlation between explanation stability and trajectory reliability (r=0.99) represents a significant finding for safety-critical systems. When a model's stated reasoning changes after perturbation, physical trajectory errors spike dramatically from 4.1m to 21.8m, indicating that explanation instability precedes dangerous behavior. This suggests explanations function as a leading indicator of safety failures, not merely post-hoc justifications.
For the autonomous vehicle industry, these findings create both challenges and opportunities. The linear degradation pattern (R²=0.957) across noise intensities enables predictable safety modeling, while the marginal effectiveness of standard preprocessing defenses highlights the need for reasoning-aware robustness techniques. Deploying VLA systems without runtime monitoring of explanation consistency could mask safety-critical failures, potentially exposing manufacturers to liability.
The practical implication centers on monitoring infrastructure: autonomous systems could continuously validate that their reasoning remains stable, triggering conservative driving behaviors or human intervention when explanation consistency drops below thresholds. This shifts safety engineering from trajectory-only validation to reasoning-aware deployment, establishing a new testing and monitoring standard for interpretable AI systems in autonomous driving.
- →Explanation consistency correlates strongly with trajectory reliability (r=0.99), enabling safety monitoring through reasoning stability checks.
- →Trajectory errors increase 5.3x when Chain-of-Causation explanations change due to sensor perturbations, demonstrating reasoning fragility.
- →Standard input preprocessing defenses provide minimal protection against robustness degradation in VLA models.
- →Linear degradation patterns across noise intensities (R²=0.957) enable predictable safety modeling for autonomous systems.
- →Reasoning-based runtime monitoring could serve as a practical safety mechanism for VLA deployment in autonomous vehicles.