AIBullisharXiv โ CS AI ยท 5h ago
๐ง
Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset
Stanford researchers introduced Merlin, a 3D vision-language foundation model for analyzing abdominal CT scans that processes volumetric medical images alongside electronic health records and radiology reports. The model was trained on over 6 million images from 15,331 CT scans and demonstrated superior performance compared to existing 2D models across 752 individual medical tasks.