#3d-perception News & Analysis

4 articles tagged with #3d-perception. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AIBullisharXiv – CS AI · Jun 47/10

🧠

DVGT: Driving Visual Geometry Transformer

Researchers introduce DVGT, a transformer-based model for 3D scene reconstruction in autonomous driving that works without explicit camera parameters. Trained on multiple large driving datasets, the system demonstrates improved performance by directly inferring dense geometry from unposed multi-view sequences, eliminating dependence on precise calibration data.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation

A comprehensive survey of multi-modal 3D intelligence research reveals significant advances in combining 3D data with complementary modalities like camera images and textual descriptions, addressing critical gaps in autonomous driving and world simulation applications. The systematic review categorizes existing methods and benchmarks recent approaches, highlighting both strengths and limitations while identifying future research opportunities.

AINeutralarXiv – CS AI · May 285/10

🧠

Manboformer: Learning Gaussian Representations via Spatial-temporal Attention Mechanism

Researchers propose Manboformer, an improvement to GaussianFormer that enhances 3D semantic occupancy prediction for autonomous driving by incorporating spatial-temporal attention mechanisms. The method addresses performance limitations in the original Gaussian-based approach by leveraging temporal information, with evaluation ongoing on the NuScenes dataset.

AINeutralarXiv – CS AI · May 76/10

🧠

Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

Ilov3Splat introduces a framework for understanding 3D scenes using natural language by combining 3D Gaussian Splatting with CLIP features and SAM masks. The method achieves better cross-view consistency and instance-level reasoning than prior approaches, enabling object identification without manual annotation.