AINeutralarXiv – CS AI · 9h ago6/10
🧠
WorldFly: A World-Model-Based Vision-Language-Action Model for UAV Navigation
WorldFly introduces a world-model-based Vision-Language-Action framework that enables UAVs to navigate complex urban environments by predicting future states rather than relying solely on immediate observations. The system uses a dual-branch coupled flow matching mechanism to generate both video predictions and navigation actions, addressing critical limitations in dense urban scenarios with severe occlusions and sharp directional changes.