#spatial-reasoning News & Analysis

72 articles tagged with #spatial-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

72 articles

AIBullisharXiv – CS AI · Mar 177/10

🧠

RieMind: Geometry-Grounded Spatial Agent for Scene Understanding

Researchers developed RieMind, a new AI framework that improves spatial reasoning in indoor scenes by 16-50% by separating visual perception from logical reasoning using explicit 3D scene graphs. The system grounds language models in structured geometric representations rather than processing videos end-to-end, achieving significantly better performance on spatial understanding benchmarks.

AIBullisharXiv – CS AI · Mar 117/10

🧠

World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models

Researchers introduce World2Mind, a training-free spatial intelligence toolkit that enhances foundation models' 3D spatial reasoning capabilities by up to 18%. The system uses 3D reconstruction and cognitive mapping to create structured spatial representations, enabling text-only models to perform complex spatial reasoning tasks.

🧠 GPT-5

AIBullisharXiv – CS AI · Mar 97/10

🧠

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

Researchers introduce BEVLM, a framework that integrates Large Language Models with Bird's-Eye View representations for autonomous driving. The approach improves LLM reasoning accuracy in cross-view driving scenarios by 46% and enhances end-to-end driving performance by 29% in safety-critical situations.

AINeutralarXiv – CS AI · Mar 57/10

🧠

World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

Research shows that static word embeddings like GloVe and Word2Vec can recover substantial geographic and temporal information from text co-occurrence patterns alone, challenging assumptions that such capabilities require sophisticated world models in large language models. The study found these simple embeddings could predict city coordinates and historical birth years with high accuracy, suggesting that linear probe recoverability doesn't necessarily indicate advanced internal representations.

AIBullisharXiv – CS AI · Mar 56/10

🧠

PROSPECT: Unified Streaming Vision-Language Navigation via Semantic--Spatial Fusion and Latent Predictive Representation

Researchers propose PROSPECT, a new AI system that combines semantic understanding with spatial modeling for improved Vision-Language Navigation. The system uses streaming 3D spatial encoding and predictive representation learning to achieve state-of-the-art performance in robot navigation tasks.

AIBullisharXiv – CS AI · Mar 57/10

🧠

TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics

Researchers have developed TIGeR, a framework that enhances Vision-Language Models with precise geometric reasoning capabilities for robotics applications. The system enables VLMs to execute centimeter-level accurate computations by integrating external computational tools, moving beyond qualitative spatial reasoning to quantitative precision required for real-world robotic manipulation.

AINeutralarXiv – CS AI · Mar 57/10

🧠

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

Researchers introduce SpatialBench, a comprehensive benchmark for evaluating spatial cognition in multimodal large language models (MLLMs). The framework reveals that while MLLMs excel at perceptual grounding, they struggle with symbolic reasoning, causal inference, and planning compared to humans who demonstrate more goal-directed spatial abstraction.

AIBearisharXiv – CS AI · Mar 46/103

🧠

SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

Researchers introduce SpatialText, a diagnostic framework to test whether large language models can truly reason about spatial relationships or merely rely on linguistic patterns. The study reveals that current AI models fail at egocentric perspective reasoning despite proficiency in basic spatial fact retrieval.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Learning Object-Centric Spatial Reasoning for Sequential Manipulation in Cluttered Environments

Researchers developed Unveiler, a robotic manipulation framework that uses object-centric spatial reasoning to retrieve items from cluttered environments. The system achieves up to 97.6% success in simulation by separating high-level spatial reasoning from low-level action execution, and demonstrates zero-shot transfer to real-world scenarios.

AINeutralarXiv – CS AI · Feb 277/107

🧠

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

Researchers developed Compositional-ARC, a dataset to test AI models' ability to systematically generalize abstract spatial reasoning tasks. A small 5.7M parameter transformer model trained with meta-learning outperformed large language models like GPT-4o and Gemini 2.0 Flash on novel geometric transformation combinations.

AINeutralarXiv – CS AI · Jun 256/10

🧠

HG-Bench: A Benchmark for Multi-Page Handwritten Answer-Region Grounding in Automated Homework Assessment

Researchers introduce HG-Bench, a benchmark dataset of 500 annotated homework samples for evaluating automated grading systems' ability to locate and decompose handwritten student answers across multiple pages. Current AI models, including frontier VLMs, achieve less than 55% accuracy on complete answer localization, revealing a significant capability gap in understanding spatial reasoning structures in handwritten documents.

AINeutralarXiv – CS AI · Jun 255/10

🧠

Position Spaces and Graphs

Researchers introduce position graphs, a novel graph-based reasoning framework that formalizes spatial relationships between discrete tokens using strict partial orders. The work establishes theoretical foundations for consistency conditions and proves that pattern discovery within position graphs remains computationally NP-complete, with implications for document processing and spatial reasoning systems.

AIBullisharXiv – CS AI · Jun 116/10

🧠

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Researchers propose ReRe, a training-free framework that improves spatial reasoning in egocentric videos by having multimodal AI models first form a hypothesis, then revise it using synthesized novel viewpoints. The approach demonstrates significant performance gains on spatial reasoning benchmarks without modifying existing model architectures.

AINeutralarXiv – CS AI · Jun 116/10

🧠

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

Researchers propose SVoT, a reinforcement learning framework that enhances multimodal AI models' spatial reasoning by generating verifiable intermediate states and visualizations. The approach achieves up to 65% accuracy gains on out-of-distribution tests by explicitly modeling state transitions and verification processes, addressing a critical limitation in current large language models.

AINeutralarXiv – CS AI · Jun 96/10

🧠

CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models

Researchers propose CAPruner, a scene graph pruning method that enhances how large language models perform 3D spatial reasoning by preserving task-relevant relations rather than relying solely on spatial proximity. The approach combines fuzzy semantic relevance with spatial proximity to identify critical relations, addressing computational inefficiencies in 3D vision-language tasks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning

Researchers propose PVPO, a sample-efficient reinforcement learning method that improves LLM-based LEGO assembly generation by addressing PhysHack, a failure mode where structures satisfy physical constraints but lack semantic or geometric coherence. The approach uses selective data training and couples physical feasibility with geometric rewards, achieving better structural alignment while reducing reliance on rejection sampling.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

Researchers demonstrate that textual supervision significantly improves how vision-language models understand geospatial information, with language serving as a complementary modality to visual data. The study analyzes geospatial representations across vision-only, vision-language, and multimodal foundation models, revealing systematic gaps in spatial accuracy that can be addressed through improved multimodal learning approaches.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Researchers introduce Brick-Composer, a learning framework that enhances multimodal large language models (MLLMs) with physical assembly capabilities through targeted training on brick construction tasks. The study reveals current MLLMs lack reliable spatial reasoning and fine-grained object recognition needed for real-world assembly, but demonstrates that structured learning approaches can improve performance significantly.

AINeutralarXiv – CS AI · Jun 56/10

🧠

WorldFly: A World-Model-Based Vision-Language-Action Model for UAV Navigation

WorldFly introduces a world-model-based Vision-Language-Action framework that enables UAVs to navigate complex urban environments by predicting future states rather than relying solely on immediate observations. The system uses a dual-branch coupled flow matching mechanism to generate both video predictions and navigation actions, addressing critical limitations in dense urban scenarios with severe occlusions and sharp directional changes.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation

Researchers propose a Hierarchical Semantic-Geometric Map (HSGM) that bridges the gap between 2D vision-language models and 3D spatial reasoning for embodied navigation tasks. The framework achieves state-of-the-art zero-shot performance on navigation benchmarks by decoupling semantic understanding from geometric path planning, demonstrating significant advances in how AI agents interpret language instructions to navigate physical environments.

AINeutralarXiv – CS AI · Jun 26/10

🧠

PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing

Researchers introduce PlanarBench, a benchmark that evaluates large language models' spatial reasoning abilities by testing whether they can draw planar graphs as ASCII art from edge lists. Testing 91 models on 199 non-isomorphic connected planar graphs reveals that edge count—not node count—is the dominant difficulty predictor, challenging assumptions in prior LLM graph benchmarking methodologies.

AINeutralarXiv – CS AI · Jun 16/10

🧠

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Researchers introduce SpatialAct, a benchmark testing whether vision-language models (VLMs) can understand 3D spatial layouts, reason about them coherently, and act upon that reasoning over multiple turns. The study reveals VLMs excel at isolated spatial reasoning tasks but fail to maintain consistent spatial understanding and produce reliable actions when environments change, indicating a significant gap between perception and practical action capabilities.

AINeutralarXiv – CS AI · Jun 16/10

🧠

ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models

Researchers introduce ERGeoBench, a comprehensive benchmark for evaluating multimodal large language models (MLLMs) on embodied geo-localization tasks using 2,207 street-view panoramas across three progressive difficulty settings. The evaluation reveals that current leading models can understand high-level geographic semantics but struggle with fine-grained perception, metric localization, and spatial consistency, highlighting that accurate geo-localization requires integrated perception and reasoning rather than isolated visual recognition.

AINeutralarXiv – CS AI · Jun 16/10

🧠

The Sword, Shield, and Achilles' Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning

Researchers propose a framework to evaluate how linguistic structures and contextual features shape Large Language Model behavior in spatial reasoning tasks. The study reveals that topological information provides robust navigation planning, linguistic format effectiveness depends on model size, and semantic errors can critically undermine performance.

AINeutralarXiv – CS AI · May 296/10

🧠

AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials

Researchers introduced AtomWorld, a benchmark for evaluating how well large language models can perform spatial reasoning tasks in materials science, specifically atomic structure manipulation. The study reveals that current LLMs like Claude Opus 4.6 struggle with complex spatial operations, achieving success rates below 12% for rotation tasks, suggesting they function better as collaborative tools than autonomous scientific agents.

🧠 Claude🧠 Opus

← PrevPage 2 of 3Next →