🧠 AI⚪ NeutralImportance 6/10

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

arXiv – CS AI|Vukasin Bozic, Isidora Slavkovic, Dominik Narnhofer, Nando Metzger, Denis Rozumny, Konrad Schindler, Nikolai Kalischek|May 27, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce PaGeR, a framework that adapts 3D foundation models trained on perspective images to work with panoramic imagery, enabling geometry estimation from 360-degree scenes. The unified model predicts depth, surface normals, and sky masks from both standard and panoramic images in a single pass, achieving state-of-the-art performance on indoor and outdoor scenes.

Analysis

PaGeR addresses a significant gap in 3D reconstruction technology by extending perspective-based foundation models into the panoramic domain. Recent advances in 3D geometry estimation have leveraged transformer architectures and large-scale pre-training to achieve impressive results from multi-view and single-image inputs. However, panoramic imagery—which captures 360-degree views—presents unique challenges due to its equirectangular projection and different geometric properties. This work bridges that gap by keeping architectural modifications minimal while training on mixed perspective and panoramic datasets, allowing the model to retain the learned 3D priors of its foundation model.

The technical approach leverages transfer learning efficiently, avoiding the computational cost of training from scratch while adapting to omnidirectional geometry. The ability to predict scale-invariant depth, metric depth, surface normals, and sky masks across both image types in a unified framework demonstrates architectural elegance and practical utility. For applications requiring 360-degree scene understanding—such as virtual reality, autonomous navigation in large spaces, or immersive content creation—this represents meaningful progress.

The zero-shot generalization across diverse indoor and outdoor scenes suggests the model has learned generalizable geometric principles rather than memorizing dataset-specific patterns. This work advances the foundation model paradigm in computer vision, showing how existing powerful models can be extended to new domains without complete retraining. The research contributes to the broader trend of adapting large-scale vision models to specialized tasks, potentially influencing how future 3D reconstruction tools are developed across industry applications.

Key Takeaways

→PaGeR successfully adapts perspective-based 3D foundation models to panoramic imagery through minimal architectural changes and mixed-domain training
→The unified model predicts multiple geometry outputs (depth, normals, sky masks) from both standard and 360-degree images in a single forward pass
→State-of-the-art performance is achieved on both indoor and outdoor scenes with strong zero-shot generalization capabilities
→Transfer learning approach preserves rich 3D priors from foundation models while learning panoramic-specific geometric consistency
→Framework enables practical applications in VR, autonomous navigation, and immersive content creation requiring 360-degree scene understanding