y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#scene-graphs News & Analysis

5 articles tagged with #scene-graphs. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AIBullisharXiv – CS AI Β· Mar 177/10
🧠

RieMind: Geometry-Grounded Spatial Agent for Scene Understanding

Researchers developed RieMind, a new AI framework that improves spatial reasoning in indoor scenes by 16-50% by separating visual perception from logical reasoning using explicit 3D scene graphs. The system grounds language models in structured geometric representations rather than processing videos end-to-end, achieving significantly better performance on spatial understanding benchmarks.

AINeutralarXiv – CS AI Β· Apr 156/10
🧠

Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks

Researchers introduce Spatial Atlas, a compute-grounded reasoning system that combines deterministic spatial computation with large language models to create spatial-aware research agents. The framework demonstrates competitive performance on two benchmarksβ€”FieldWorkArena for multimodal spatial question-answering and MLE-Bench for machine learning competitionsβ€”while improving interpretability by grounding reasoning in structured spatial scene graphs rather than relying on hallucinated outputs.

🏒 OpenAI🏒 Anthropic
AIBullisharXiv – CS AI Β· Mar 276/10
🧠

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

Researchers introduced Graph-of-Mark (GoM), a new visual prompting technique that overlays scene graphs onto images to improve spatial reasoning in multimodal language models. Testing across 3 open-source MLMs and 4 datasets showed GoM improved zero-shot visual question answering and localization accuracy by up to 11 percentage points compared to existing methods like Set-of-Mark.

AINeutralarXiv – CS AI Β· Feb 276/107
🧠

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

Researchers introduce PoSh, a new evaluation metric for detailed image descriptions that uses scene graphs to guide LLMs-as-a-Judge, achieving better correlation with human judgments than existing methods. They also present DOCENT, a challenging benchmark dataset featuring artwork with expert-written descriptions to evaluate vision-language models' performance on complex image analysis.