#scene-graphs News & Analysis

10 articles tagged with #scene-graphs. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · Mar 177/10

🧠

RieMind: Geometry-Grounded Spatial Agent for Scene Understanding

Researchers developed RieMind, a new AI framework that improves spatial reasoning in indoor scenes by 16-50% by separating visual perception from logical reasoning using explicit 3D scene graphs. The system grounds language models in structured geometric representations rather than processing videos end-to-end, achieving significantly better performance on spatial understanding benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models

Researchers propose CAPruner, a scene graph pruning method that enhances how large language models perform 3D spatial reasoning by preserving task-relevant relations rather than relying solely on spatial proximity. The approach combines fuzzy semantic relevance with spatial proximity to identify critical relations, addressing computational inefficiencies in 3D vision-language tasks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments

Researchers introduce PhysScene, the first scene graph dataset specifically designed for physics experiments, enabling AI systems to understand complex scientific setups through structured visual reasoning. The dataset prioritizes semantic accuracy and relational density over scale, addressing a gap in domain-specific AI training data for scientific applications.

AINeutralarXiv – CS AI · Jun 26/10

🧠

PSG-Nav: Probabilistic Scene Graph Navigation via Multiverse Decision Making

Researchers introduce PSG-Nav, a novel navigation system that uses probabilistic scene graphs to help AI agents navigate complex environments while accounting for perception uncertainty. The system achieves state-of-the-art results on three major benchmarks by employing multiverse decision-making and an evidential calibrator to reduce false positives in open-vocabulary navigation tasks.

AINeutralarXiv – CS AI · May 296/10

🧠

Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes

Researchers introduce a computational method for pre-capture portrait photography planning that generates optimal human poses, camera angles, lighting, and exposure settings within 3D scenes before photos are taken. Rather than focusing on post-production editing, this approach uses a Photographic Scene Graph to represent scene affordances and lighting structure, enabling AI-guided planning that produces aesthetically superior portraits while maintaining physical feasibility.

AINeutralarXiv – CS AI · May 116/10

🧠

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding

Response-G1 introduces a novel framework for real-time video understanding that uses explicit scene graphs to align video evidence with query-specific response conditions, enabling Video-LLMs to make more accurate timing decisions during streaming video analysis without requiring fine-tuning.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks

Researchers introduce Spatial Atlas, a compute-grounded reasoning system that combines deterministic spatial computation with large language models to create spatial-aware research agents. The framework demonstrates competitive performance on two benchmarks—FieldWorkArena for multimodal spatial question-answering and MLE-Bench for machine learning competitions—while improving interpretability by grounding reasoning in structured spatial scene graphs rather than relying on hallucinated outputs.

🏢 OpenAI🏢 Anthropic

AINeutralarXiv – CS AI · Apr 136/10

🧠

3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding

Researchers introduce 3D-VCD, an inference-time framework that reduces hallucinations in 3D-LLM embodied agents by contrasting predictions against distorted scene graphs. The method addresses failures specific to 3D spatial reasoning without requiring model retraining, advancing reliability in embodied AI systems.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

Researchers introduced Graph-of-Mark (GoM), a new visual prompting technique that overlays scene graphs onto images to improve spatial reasoning in multimodal language models. Testing across 3 open-source MLMs and 4 datasets showed GoM improved zero-shot visual question answering and localization accuracy by up to 11 percentage points compared to existing methods like Set-of-Mark.

AINeutralarXiv – CS AI · Feb 276/107

🧠

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

Researchers introduce PoSh, a new evaluation metric for detailed image descriptions that uses scene graphs to guide LLMs-as-a-Judge, achieving better correlation with human judgments than existing methods. They also present DOCENT, a challenging benchmark dataset featuring artwork with expert-written descriptions to evaluate vision-language models' performance on complex image analysis.