VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
Researchers introduce VLA-Pro, a framework that enhances vision-language-action models for robotics by storing and retrieving task-specific procedural memories during inference. The approach achieves dramatic performance gains—up to 207% improvement in simulation and raising real-world success rates from 5.8% to 65%—demonstrating significant progress in cross-task generalization for robotic manipulation.
VLA-Pro addresses a critical limitation in current vision-language-action models: their inability to effectively generalize to novel tasks by leveraging experience across different objects, scenes, and action patterns. The framework operates as a modular plug-and-play system that stores task-specific LoRA adapters as procedural memories during training, then dynamically retrieves and fuses these memories during inference based on multi-modal context. This architecture preserves the modularity and stability of execution while enabling sophisticated knowledge transfer.
The advancement builds on growing recognition that language models and multimodal systems benefit from structured memory mechanisms. Rather than relying solely on end-to-end training, VLA-Pro demonstrates that explicit procedural memory storage mirrors how humans transfer skills across related tasks. The testing across RoboTwin, RLBench, and real-world environments suggests the approach generalizes across different robotic platforms and task complexities.
For the robotics and AI industry, these results signal meaningful progress toward practical general-purpose manipulation systems. The 207% relative improvement in simulation and the tenfold increase in real-world success rates (5.8% to 65%) represent substantial performance gains that could accelerate deployment of robotic systems in manufacturing, logistics, and service sectors. The modularity of the approach appeals to developers building on existing VLA backbones without requiring architectural changes.
Looking ahead, research will likely focus on scaling procedural memory systems to larger task distributions, optimizing memory retrieval efficiency for real-time robotic control, and exploring how procedural memory transfers across entirely different domains beyond manipulation.
- →VLA-Pro stores task-specific LoRA adapters as procedural memories to enable cross-task generalization in robotic manipulation
- →Real-world manipulation success rate increased from 5.8% to 65%, demonstrating practical viability for deployed systems
- →Framework achieves up to 207% relative improvement in simulation environments across multiple robotic platforms
- →Modular plug-and-play design allows integration with existing VLA backbones without architectural modifications
- →Dynamic memory fusion mechanism enables robots to transfer manipulation experience to novel tasks while maintaining execution stability