AIBullisharXiv โ CS AI ยท 6h ago2
๐ง
Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation
Researchers introduce Pri4R, a new approach that enhances Vision-Language-Action (VLA) models by incorporating 4D spatiotemporal understanding during training. The method adds a lightweight point track head that predicts 3D trajectories, improving physical world understanding while maintaining the original architecture during inference with no computational overhead.