AINeutralarXiv – CS AI · 6h ago6/10
🧠
Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation
Researchers propose a Hierarchical Semantic-Geometric Map (HSGM) that bridges the gap between 2D vision-language models and 3D spatial reasoning for embodied navigation tasks. The framework achieves state-of-the-art zero-shot performance on navigation benchmarks by decoupling semantic understanding from geometric path planning, demonstrating significant advances in how AI agents interpret language instructions to navigate physical environments.