#grounding News & Analysis

8 articles tagged with #grounding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · Feb 277/107

🧠

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Molmo2 is a new open-source family of vision-language models that achieves state-of-the-art performance among open models, particularly excelling in video understanding and pixel-level grounding tasks. The research introduces 7 new video datasets and 2 multi-image datasets collected without using proprietary VLMs, along with an 8B parameter model that outperforms existing open-weight models and even some proprietary models on specific tasks.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Semantic Partial Grounding via LLMs

Researchers introduce SPG-LLM, a novel approach that leverages large language models to optimize the grounding process in classical planning by identifying irrelevant objects and actions before computation. The method achieves significantly faster grounding times—often by orders of magnitude—across seven challenging benchmarks while maintaining or improving plan quality.

AINeutralarXiv – CS AI · Jun 46/10

🧠

NoRA: Evaluating Grounded Reasonableness in Visual First-person Normative Action Reasoning

Researchers introduce NoRA, a visual reasoning benchmark that evaluates whether AI models can generate and justify appropriate actions in first-person video scenarios through explicit reasoning graphs. The benchmark reveals that current multimodal language models struggle to construct complete action spaces and properly ground decisions in visible evidence, highlighting a critical gap between selecting plausible actions and explaining them through verifiable reasoning.

AINeutralarXiv – CS AI · May 286/10

🧠

ROVER: Routing Object-Centric Visual Evidence for Grounded Multi-Image Reasoning

Researchers introduce ROVER, a lightweight plugin that enhances multimodal large language models' ability to reason across multiple images by intelligently routing visual evidence to specific objects. The approach achieves significant performance improvements on grounded reasoning benchmarks while reducing computational overhead compared to existing methods.

AINeutralarXiv – CS AI · May 116/10

🧠

TRACE: Tourism Recommendation with Accountable Citation Evidence

Researchers introduce TRACE, a benchmark dataset for evaluating tourism recommendation systems that combine multi-turn dialogue, verifiable review citations, and rejection recovery. The dataset reveals a significant gap in existing conversational recommender systems: LLMs excel at recall but cite weakly, while retrieval-based systems ground better but struggle with accuracy and adaptation.

AIBullisharXiv – CS AI · Apr 76/10

🧠

GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

Researchers introduced GroundedKG-RAG, a new retrieval-augmented generation system that creates knowledge graphs directly grounded in source documents to improve long-document question answering. The system reduces resource consumption and hallucinations while maintaining accuracy comparable to state-of-the-art models at lower cost.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Researchers propose a hierarchical planning framework to analyze why LLM-based web agents fail at complex navigation tasks. The study reveals that while structured PDDL plans outperform natural language plans, low-level execution and perceptual grounding remain the primary bottlenecks rather than high-level reasoning.

AIBullishGoogle DeepMind Blog · Dec 176/103

🧠

FACTS Grounding: A new benchmark for evaluating the factuality of large language models

Researchers have introduced FACTS Grounding, a new benchmark designed to evaluate how accurately large language models ground their responses in source material and avoid hallucinations. The benchmark includes a comprehensive evaluation system and online leaderboard to measure LLM factuality performance.