#spatial-understanding News & Analysis

2 articles tagged with #spatial-understanding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBearisharXiv – CS AI · May 16/10

🧠

Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

Researchers find that vision-language models (VLMs) significantly underperform on relative camera pose estimation tasks, achieving only 66% accuracy compared to humans (91%) and specialized pipelines (99%). The study identifies specific gaps in multi-view spatial reasoning, including cross-view correspondence and projective camera-motion understanding, revealing concrete limitations in VLM capabilities beyond single-image tasks.

🧠 GPT-5

AIBullisharXiv – CS AI · Apr 66/10

🧠

Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

Researchers have developed Efficient3D, a framework that accelerates 3D Multimodal Large Language Models (MLLMs) while maintaining accuracy through adaptive token pruning. The system uses a Debiased Visual Token Importance Estimator and Adaptive Token Rebalancing to reduce computational overhead without sacrificing performance, showing +2.57% CIDEr improvement on benchmarks.