y0news
← Feed
←Back to feed
🧠 AIβšͺ Neutral

ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models

arXiv – CS AI|Matteo Merler, Nicola Dainese, Minttu Alakuijala, Giovanni Bonetta, Pietro Ferrazzi, Yu Tian, Bernardo Magnini, Pekka Marttinen||1 views
πŸ€–AI Summary

Researchers introduce ViPlan, the first benchmark for comparing Vision-Language Model planning approaches, finding that VLM-as-grounder methods excel in visual tasks like Blocksworld while VLM-as-planner methods perform better in household robotics scenarios. The study reveals fundamental limitations in current VLMs' visual reasoning abilities, with Chain-of-Thought prompting showing no consistent benefits.

Key Takeaways
  • β†’ViPlan is the first open-source benchmark to compare VLM-grounded symbolic planning with direct VLM planning methods.
  • β†’VLM-as-grounder approaches solved 46% of Blocksworld tasks compared to 9% for direct VLM planning methods.
  • β†’In household robotics tasks, VLM-as-planner methods significantly outperformed VLM-as-grounder approaches (34% vs 5% success rate).
  • β†’Chain-of-Thought prompting showed no consistent benefits across methods, highlighting persistent VLM limitations.
  • β†’The benchmark reveals that different planning approaches excel in different domains based on visual complexity and linguistic knowledge requirements.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles