AIBullisharXiv โ CS AI ยท 5h ago1
๐ง
Chain of World: World Model Thinking in Latent Motion
Researchers introduce CoWVLA (Chain-of-World VLA), a new Vision-Language-Action model paradigm that combines world-model temporal reasoning with latent motion representation for embodied AI. The approach outperforms existing methods in robotic simulation benchmarks while maintaining computational efficiency through a unified autoregressive decoder that models both keyframes and action sequences.