Continually Evolving Skill Knowledge in Vision Language Action Model
Researchers introduce Stellar VLA, a continual learning framework for vision-language-action models that improves knowledge accumulation without adding network parameters. The approach uses knowledge-guided expert routing and hierarchical task structures, achieving strong performance on robotics benchmarks with minimal data replay and validated real-world transfer capabilities.
Stellar VLA addresses a critical bottleneck in deploying large-scale vision-language-action models for robotics: enabling efficient continual learning without expanding model size. Traditional continual imitation learning methods require additional parameters or external modules, creating scalability challenges as VLA models grow larger. This research proposes a parameter-efficient solution that maintains performance while reducing computational overhead, a key concern for practical robotics deployment.
The framework's innovation lies in its knowledge-driven approach, which jointly optimizes task representations and a learned knowledge space through expert routing mechanisms. The hierarchical variant (TS-Stellar) structures learning around task-skill relationships, enabling better knowledge organization and transfer. This design reflects broader trends in machine learning toward more efficient adaptation mechanisms that leverage existing model capacity rather than continually expanding it.
For the robotics and embodied AI industries, Stellar VLA's validation on both simulated (LIBERO benchmark) and real dual-arm platforms demonstrates practical viability. The ability to achieve strong performance using only 1% data replay has significant implications for training efficiency and cost reduction in robotic systems. Real-world transfer with distinct embodiment and scene configurations indicates the framework generalizes beyond controlled environments.
Looking forward, the parameter-efficient continual learning approach could influence how large foundation models are adapted for embodied tasks across industries. The demonstrated knowledge retention and task discovery mechanisms may inform future architectures for multi-robot systems and transfer learning scenarios. Continued refinement of hierarchical task-skill structures could unlock more sophisticated robot learning paradigms.
- βStellar VLA enables continual learning in vision-language-action models without increasing network parameters, improving scalability for large robotics models.
- βKnowledge-guided expert routing mechanism enables task specialization and efficient knowledge transfer with minimal data replay (1%).
- βHierarchical task-skill structure (TS-Stellar) outperforms flat approaches for complex manipulation tasks requiring multi-level reasoning.
- βReal-world validation on dual-arm platforms with different embodiments confirms effective knowledge transfer beyond simulated environments.
- βParameter-efficient continual learning addresses a critical deployment challenge for embodied AI systems in practical robotics applications.