#3d-reconstruction News & Analysis

43 articles tagged with #3d-reconstruction. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

43 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

ACE-GS: Acing the Trade-off with Accurate, Compact and Efficient 3D Gaussian Splatting

Researchers introduce ACE-GS, an optimized framework for 3D Gaussian Splatting that achieves 3.7x faster training than existing accelerated methods while maintaining superior rendering quality and compact storage. The system uses momentum-guided primitive management, statistical pruning, and frequency compensation to balance reconstruction speed with visual fidelity, converging in 3-5 minutes with up to 0.89 dB PSNR improvement over baseline methods.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Generalized-CVO: Fast and Correspondence-Free Local Point Cloud Registration with Second Order Riemannian Optimization

Researchers propose Generalized-CVO, a fast point cloud registration method using second-order Riemannian optimization that achieves 10x speedup over previous approaches. The technique demonstrates significant improvements in LiDAR tracking with >55% drift reduction in sparse environments and enhanced robustness on object registration benchmarks.

AIBullisharXiv – CS AI · Jun 97/10

🧠

EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets

EgoAERO introduces a framework enabling robots to learn dexterous manipulation skills from single egocentric human videos without requiring pre-scanned object assets or CAD models. The system reconstructs hand-object trajectories and converts them into robot policies, supported by a new large-scale dataset (EgoDex-R) containing 4.3M RGB-D frames, achieving performance comparable to traditional asset-dependent methods.

AIBullisharXiv – CS AI · Jun 47/10

🧠

SAM 3D: 3Dfy Anything in Images

SAM 3D is a generative AI model that reconstructs 3D objects from single images, predicting geometry, texture, and layout with significant improvements over existing methods. The team developed a human-in-the-loop annotation pipeline to create large-scale training data and plans to release code, weights, and a benchmark dataset.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion

Researchers introduce Real2SAM2Real, a framework that enhances Video Diffusion Models by incorporating explicit 3D geometric caches extracted from SAM3D models, enabling more precise control over camera movements and scene dynamics while maintaining structural consistency in complex occlusions and high-motion scenarios.

AIBullisharXiv – CS AI · Mar 177/10

🧠

3D-LFM: Lifting Foundation Model

Researchers have developed the first 3D Lifting Foundation Model (3D-LFM) that can reconstruct 3D structures from 2D landmarks without requiring correspondence across training data. The model uses transformer architecture to achieve state-of-the-art performance across various object categories with resilience to occlusions and noise.

AIBullisharXiv – CS AI · Mar 117/10

🧠

World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models

Researchers introduce World2Mind, a training-free spatial intelligence toolkit that enhances foundation models' 3D spatial reasoning capabilities by up to 18%. The system uses 3D reconstruction and cognitive mapping to create structured spatial representations, enabling text-only models to perform complex spatial reasoning tasks.

🧠 GPT-5

AIBullisharXiv – CS AI · Mar 57/10

🧠

ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Researchers introduce ZipMap, a new AI model for 3D reconstruction that achieves linear-time processing while maintaining accuracy comparable to slower quadratic-time methods. The system can reconstruct over 700 frames in under 10 seconds on a single H100 GPU, making it more than 20x faster than current state-of-the-art approaches like VGGT.

AIBullisharXiv – CS AI · Mar 56/10

🧠

EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations

EgoWorld is a new AI framework that converts third-person camera views into first-person perspectives using 3D data and diffusion models. The technology addresses limitations in current methods and shows strong performance across multiple datasets, with applications in AR, VR, and robotics.

AINeutralarXiv – CS AI · Jun 236/10

🧠

CAOA -- Completion-Assisted Object-CAD Alignment

Researchers introduce CAOA, a method for aligning CAD models to real-world objects in 3D indoor scans by combining point cloud completion with symmetry-aware pose estimation. The approach achieves 17% accuracy improvement over existing methods and introduces S2C-Completion, a new benchmark dataset of 8,500+ annotated object-CAD pairs for advancing 3D reconstruction tasks.

AINeutralarXiv – CS AI · Jun 196/10

🧠

PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement

Researchers introduce PSCT-Net, a novel AI framework that reconstructs 3D pediatric skull CT scans from sparse 2D X-rays using differentiable back-projection and attention mechanisms, reducing radiation exposure to children while maintaining diagnostic accuracy. The team also releases PedSkull-CT, a new pediatric-focused dataset addressing the lack of child-specific medical imaging benchmarks in existing research.

AINeutralarXiv – CS AI · Jun 95/10

🧠

3D Oral Modelling with Improved Vertex Distribution Using Matching-Based Learning

Researchers improved a deep learning framework for 3D oral reconstruction by introducing Hungarian matching and Repulsion Loss to achieve more uniform vertex distribution across predicted dental models. While numerical accuracy decreased from 77.49% to 68.02%, the trade-off eliminates vertex clustering in sparse regions, producing more clinically useful reconstructions from intraoral images.

AINeutralarXiv – CS AI · Jun 96/10

🧠

CLONE: A 3DGS-Based Closed-Loop Differentiable Optimization Framework for Single-Image Normal Estimation

Researchers introduce CLONE, a 3D Gaussian Splatting-based framework that estimates surface normals from single images by creating a closed-loop differentiable optimization pathway. The method unifies discriminative and generative approaches through an image-geometry-image consistency loop, eliminating the need for explicit normal supervision while maintaining geometric accuracy and local detail.

AINeutralarXiv – CS AI · Jun 86/10

🧠

SCOUT: Semantic scene COverage via Uncertainty-guided Traversal

SCOUT is an online semantic exploration framework that enables robots to actively understand indoor environments by coupling real-time scene graph construction with uncertainty-guided traversal planning. The system builds 3D scene graphs with probabilistic object labels and structural relations, then uses uncertainty metrics to decide where robots should explore next, treating semantic scene completion as an operational objective rather than a passive mapping byproduct.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Deep Learning-based 3D Oral Cavity Reconstruction Using 2D Intraoral Images

Researchers propose a deep learning method that reconstructs 3D oral cavity models from just ten 2D intraoral images, eliminating the need for expensive scanning equipment or uncomfortable impression-taking procedures. Achieving 77.49% accuracy using MobileNetV2 and multi-head attention mechanisms, the approach offers a cost-effective alternative for dental modeling, though it currently exhibits uneven point distribution in reconstructed models.

AINeutralarXiv – CS AI · Jun 45/10

🧠

SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning

Researchers introduce SFMambaNet, a novel deep learning architecture that combines spectral-frequency analysis with Mamba-based state space models to improve correspondence pruning—the task of filtering accurate feature matches from noisy initial sets. The method outperforms existing Graph Neural Network approaches by integrating frequency domain perception to better distinguish valid correspondences from outliers.

AINeutralarXiv – CS AI · Jun 26/10

🧠

A Survey of 3D Reconstruction with Event Cameras

A comprehensive survey reviews 3D reconstruction techniques using event cameras, which capture asynchronous per-pixel brightness changes rather than traditional frames. The research categorizes methods across stereo, monocular, and multimodal systems using geometry-based, deep learning, and neural rendering approaches, identifying key challenges in datasets, evaluation standards, and dynamic scene handling.

AIBullisharXiv – CS AI · Jun 26/10

🧠

Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image

Researchers present a novel view synthesis method using differentiable Multiplane Images (MPI) that achieves 30.7% faster rendering and uses 85.2% less memory than Gaussian Splatting approaches while maintaining competitive quality. The technique combines geometric initialization from visual foundation models with one-step diffusion to handle sparse-view conditions, making it practical for mobile deployment.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation

Researchers introduce MDA (Mixture-Density Ambiguity), a depth estimation technique that predicts multiple depth hypotheses per pixel rather than a single value, effectively eliminating 'flying points'—spurious 3D artifacts that appear in empty space between foreground and background surfaces near object boundaries.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors

Researchers propose a self-supervised framework for monocular depth and pose estimation in endoscopy using a Generative Latent Bank and VAE to improve 3D mapping of the gastrointestinal tract. The method achieves superior performance over existing self-supervised approaches on standard endoscopic datasets without requiring synthetic training data.

AINeutralarXiv – CS AI · Jun 15/10

🧠

Feature-Optimized Vision for Adaptive 3D Scene Reconstruction

Researchers propose an adaptive feature-selection system for 3D scene reconstruction that intelligently prioritizes visual data based on texture, repeatability, and geometric utility rather than using fixed thresholds. The method demonstrates improved reconstruction quality and computational efficiency across diverse scene types compared to baseline approaches, offering a modular enhancement for both classical and neural reconstruction pipelines.

AINeutralarXiv – CS AI · May 296/10

🧠

City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images

City-Mesh3R introduces a scalable framework for reconstructing high-fidelity 3D city-scale meshes directly from unordered image collections using a divide-and-conquer strategy. The method addresses limitations of existing NeRF and Gaussian Splatting approaches by producing watertight, simulation-ready meshes suitable for large urban scenes without prohibitive computational overhead.

AIBullisharXiv – CS AI · May 276/10

🧠

E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control

Researchers introduce E³C, a video diffusion framework enabling controllable egocentric video generation with 3D environmental memory and separate human pose controls for both camera wearers and observed subjects. The system addresses unique challenges in first-person video synthesis by maintaining scene consistency while handling rapid viewpoint changes and partial occlusions.

AINeutralarXiv – CS AI · May 276/10

🧠

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Researchers introduce PaGeR, a framework that adapts 3D foundation models trained on perspective images to work with panoramic imagery, enabling geometry estimation from 360-degree scenes. The unified model predicts depth, surface normals, and sky masks from both standard and panoramic images in a single pass, achieving state-of-the-art performance on indoor and outdoor scenes.

AINeutralarXiv – CS AI · May 276/10

🧠

MuNet: A Mutualistic Network for Joint 3D Human Mesh Recovery and 3D Clothed Human Reconstruction from Single Images

Researchers introduce MuNet, a unified deep learning framework that jointly optimizes 3D human mesh recovery and clothed human reconstruction from single images using graph convolutional networks. The approach leverages mutualistic feedback between the two tasks to achieve state-of-the-art results across six benchmark datasets, with code released for research purposes.

Page 1 of 2Next →