AIBullisharXiv – CS AI · 8h ago7/10
🧠
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
Researchers present AVIC, an adaptive framework that optimizes when and how much multimodal language models should use world models for visual imagination during spatial reasoning tasks. The system learns to selectively invoke visual imagination only when necessary, reducing computational costs while matching or exceeding performance of fixed imagination strategies and proprietary baselines like GPT-4o.
🧠 GPT-4