AINeutralarXiv – CS AI · 9h ago6/10
🧠
Learning Visual Spatial Planning from Symbolic State via Modality-Gap-Aware Self-Distillation
Researchers introduce MGSD, a self-distillation framework that improves vision-language models' ability to perform visual spatial planning by using symbolic state data during training to bridge the perception-reasoning gap. The approach achieves 18-19% performance improvements on visual planning benchmarks while maintaining purely visual inference.