y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#3d-vision News & Analysis

3 articles tagged with #3d-vision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

Stanford researchers introduced Merlin, a 3D vision-language foundation model for analyzing abdominal CT scans that processes volumetric medical images alongside electronic health records and radiology reports. The model was trained on over 6 million images from 15,331 CT scans and demonstrated superior performance compared to existing 2D models across 752 individual medical tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 37/105
๐Ÿง 

Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy

Researchers propose Vid-LLM, a new video-based 3D multimodal large language model that processes video inputs without requiring external 3D data for scene understanding. The model uses a Cross-Task Adapter module and Metric Depth Model to integrate geometric cues and maintain consistency across 3D tasks like question answering and visual grounding.

AIBullisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

Researchers introduce SoPE (Spherical Coordinate-based Positional Embedding), a new method that enhances 3D Large Vision-Language Models by mapping point-cloud data into spherical coordinate space. This approach overcomes limitations of existing Rotary Position Embedding (RoPE) by better preserving spatial structures and directional variations in 3D multimodal understanding.