AINeutralarXiv – CS AI · 10h ago6/10
🧠
Decoupling the Declarative from the Procedural in Vision-Language-Action Models
Researchers introduce w²VLA, a modular Vision-Language-Action model that separates declarative knowledge (concepts and semantics) from procedural knowledge (task execution) to enable zero-shot skill transfer across novel objects. The approach addresses brittleness in current VLA systems by restructuring information flow through compositional modulation rather than opaque transformer processing, achieving superior generalization beyond object-specific training.
$VLA