Planning-aligned Token Compression for Long-Context Autonomous Driving
Researchers propose COMPACT-VA, a planning-aligned token compression framework using conditional VQ-VAE to enable vision-action models in autonomous driving to process extended temporal context within real-time computational budgets. The approach achieves over 6% improvement in driving success rates while delivering 3.3x speedup and 2.7x memory reduction compared to uncompressed processing.
The autonomous driving industry faces a fundamental computational constraint: vision-action models require extensive temporal context to make safe decisions, yet processing long token sequences exceeds real-time computational limits. COMPACT-VA addresses this by intelligently compressing historical context while preserving decision-critical information, representing a meaningful engineering advancement in making end-to-end autonomous driving systems practically deployable. The framework's innovation lies in its planning-aligned approach—rather than applying generic compression heuristics, the model conditions compression on learned planning intent extracted from future trajectories during training, then predicts that intent from compressed observations during inference. This ensures the compressed representation retains information relevant to actual driving decisions rather than arbitrarily discarding tokens. The research validates effectiveness on high-signal scenarios where context matters most: stop signs, yield situations, and proceed decisions. Across these behavioral metrics, COMPACT-VA maintains or exceeds baseline performance while dramatically reducing computational requirements. The 3.3x speedup and 2.7x memory reduction translate directly to deployment feasibility on edge hardware, critical for real-world autonomous vehicle systems operating under strict latency constraints. The closed-loop evaluation confirms the approach generalizes beyond specific test scenarios, addressing a key validation gap in autonomous driving research. This work exemplifies the broader trend of making large AI models more efficient through architectural innovations rather than simply scaling compute. For the autonomous driving sector, efficiency improvements of this magnitude could accelerate deployment timelines and reduce hardware costs per vehicle, making the technology economically viable for mass production.
- →COMPACT-VA uses conditional VQ-VAE to compress extended driving context while preserving decision-critical information for safe autonomous vehicle behavior.
- →Planning-aligned compression achieves 6% improvement in success rates and 3.3x computational speedup versus uncompressed vision-action models.
- →The framework predicts planning intent from observations without requiring future trajectory information during inference, enabling practical deployment.
- →Closed-loop evaluation validates maintained general driving performance across diverse scenarios, not just isolated test cases.
- →Token compression approach requires no backbone modifications, making it compatible with existing vision-action model architectures.