#3d-scene-understanding News & Analysis

8 articles tagged with #3d-scene-understanding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · Jun 197/10

🧠

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

Researchers introduce FlowMaps, a machine learning model that predicts how objects move in household environments by learning from human interaction patterns. The system enables robots to better navigate dynamic spaces and locate objects more reliably, demonstrated through over 600 real-world navigation episodes.

AIBullisharXiv – CS AI · May 127/10

🧠

Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models

Flame3D introduces a training-free framework that enables large language models to reason about 3D scenes compositionally without requiring 3D-specific training data. The system represents scenes as editable visual-textual memories and allows agents to synthesize custom spatial programs at inference time, achieving competitive results on existing benchmarks while opening new possibilities for multi-hop spatial reasoning.

AIBullisharXiv – CS AI · Apr 137/10

🧠

PhysInOne: Visual Physics Learning and Reasoning in One Suite

PhysInOne is a large-scale synthetic dataset containing 2 million videos across 153,810 dynamic 3D scenes designed to address the scarcity of physics-grounded training data for AI systems. The dataset covers 71 physical phenomena and includes comprehensive annotations, demonstrating significant improvements in physics-aware video generation, prediction, and property estimation when used to fine-tune foundation models.

AIBullisharXiv – CS AI · Mar 177/10

🧠

RieMind: Geometry-Grounded Spatial Agent for Scene Understanding

Researchers developed RieMind, a new AI framework that improves spatial reasoning in indoor scenes by 16-50% by separating visual perception from logical reasoning using explicit 3D scene graphs. The system grounds language models in structured geometric representations rather than processing videos end-to-end, achieving significantly better performance on spatial understanding benchmarks.

AIBullisharXiv – CS AI · Jun 86/10

🧠

MatterDoor: Sampling Zero-shot Spatio-semantic Priors using Generative Models

Researchers introduce MatterDoor, a method enabling autonomous robots to infer hidden room structure and semantics from doorway-occluded views using pretrained generative vision models without task-specific training. The approach combines VLM-guided outpainting, depth estimation, and semantic segmentation to generate 3D hypotheses of unobserved spaces, evaluated on a new Matterport3D-derived benchmark for robot navigation and object-reaching tasks.

AINeutralarXiv – CS AI · May 296/10

🧠

Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes

Researchers introduce a computational method for pre-capture portrait photography planning that generates optimal human poses, camera angles, lighting, and exposure settings within 3D scenes before photos are taken. Rather than focusing on post-production editing, this approach uses a Photographic Scene Graph to represent scene affordances and lighting structure, enabling AI-guided planning that produces aesthetically superior portraits while maintaining physical feasibility.

AINeutralarXiv – CS AI · May 126/10

🧠

Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding

Researchers introduce Curvature-Aware Captioning, a novel framework using non-Euclidean geodesic attention mechanisms to improve 3D scene understanding from point cloud data. The approach combines Oblique and Lorentz space geometries to simultaneously achieve precise object localization and coherent scene descriptions, demonstrating state-of-the-art results on ScanRefer and Nr3D benchmarks.

AINeutralarXiv – CS AI · May 76/10

🧠

Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

Ilov3Splat introduces a framework for understanding 3D scenes using natural language by combining 3D Gaussian Splatting with CLIP features and SAM masks. The method achieves better cross-view consistency and instance-level reasoning than prior approaches, enabling object identification without manual annotation.