y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#3d-vision News & Analysis

6 articles tagged with #3d-vision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AIBullisharXiv – CS AI · 5d ago7/10
🧠

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

RayDer introduces a unified transformer architecture that consolidates camera estimation, scene reconstruction, and rendering into a single model for self-supervised novel view synthesis from real-world video. The system achieves clean power-law scaling with data and compute while maintaining competitive performance with supervised approaches, addressing a key scalability challenge in 3D vision.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

Stanford researchers introduced Merlin, a 3D vision-language foundation model for analyzing abdominal CT scans that processes volumetric medical images alongside electronic health records and radiology reports. The model was trained on over 6 million images from 15,331 CT scans and demonstrated superior performance compared to existing 2D models across 752 individual medical tasks.

AIBullisharXiv – CS AI · Mar 37/105
🧠

Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy

Researchers propose Vid-LLM, a new video-based 3D multimodal large language model that processes video inputs without requiring external 3D data for scene understanding. The model uses a Cross-Task Adapter module and Metric Depth Model to integrate geometric cues and maintain consistency across 3D tasks like question answering and visual grounding.

AIBullisharXiv – CS AI · May 116/10
🧠

Knowledge Transfer Scaling Laws for 3D Medical Imaging

Researchers demonstrate that different 3D medical imaging domains (CT, MRI, PET) transfer knowledge asymmetrically during pretraining, following predictable power-law patterns. By optimizing data allocation based on these transfer dynamics, they achieve up to 58% performance gains over proportional sampling, revealing a hub-and-island structure where certain domains act as foundational knowledge sources for others.

AINeutralarXiv – CS AI · May 16/10
🧠

CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining

Researchers introduce CLAMP, a novel 3D pre-training framework for robotic manipulation that combines point cloud processing with contrastive learning to capture spatial information missing from traditional 2D image-based approaches. The method demonstrates superior performance across simulated and real-world tasks by leveraging multi-view depth data and action-conditioned learning to improve policy efficiency.

AIBullisharXiv – CS AI · Feb 276/105
🧠

SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

Researchers introduce SoPE (Spherical Coordinate-based Positional Embedding), a new method that enhances 3D Large Vision-Language Models by mapping point-cloud data into spherical coordinate space. This approach overcomes limitations of existing Rotary Position Embedding (RoPE) by better preserving spatial structures and directional variations in 3D multimodal understanding.