CT-IDP: Segmentation-Derived Quantitative Phenotypes for Interpretable Abdominal CT Disease Classification
Researchers developed CT-IDP, a quantitative phenotyping framework that uses organ segmentation and derived descriptors to classify abdominal CT diseases through interpretable logistic regression. The approach achieved superior performance compared to vision-transformer baselines across multiple datasets, demonstrating the value of explainable AI in medical imaging.
CT-IDP represents a significant advancement in medical imaging analysis by prioritizing interpretability alongside performance. Rather than relying solely on deep learning black boxes, the framework extracts over 900 quantitative features from multi-organ segmentations—including morphometry, attenuation, and disease burden metrics—then applies sparse logistic regression to identify disease-specific patterns. This hybrid approach bridges the gap between traditional medical reasoning and modern machine learning.
The validation methodology demonstrates rigor uncommon in AI research. Training on MERLIN's 15,175 studies and external evaluation on Duke-Abdomen and AMOS datasets tests genuine generalization across institutions and populations. CT-IDP consistently outperformed a DINOv3 vision-transformer baseline (macro-AUC of 0.897 vs 0.880 on MERLIN, 0.877 vs 0.857 on Duke-Abdomen), suggesting that domain-specific feature engineering remains valuable even with modern transformer architectures.
For clinical deployment, interpretability carries substantial weight. Physicians can understand which organ measurements and descriptors drive diagnostic decisions, enabling validation against clinical knowledge and identification of potential biases. This transparency becomes critical in regulated healthcare environments where algorithmic accountability matters alongside accuracy.
The framework's success with TotalSegmentator—an open-source segmentation tool—suggests reproducibility and accessibility. Future work likely involves expanding disease coverage, integrating temporal imaging data, and exploring feature interactions. The modest performance gap between approaches on AMOS (780 vs 756) indicates room for refinement on more challenging, diverse datasets.
- →CT-IDP achieved 0.897 macro-AUC by combining organ segmentation with 900+ quantitative descriptors and sparse logistic regression.
- →Interpretable features outperformed vision-transformer baselines across three independent datasets, validating the hybrid engineering approach.
- →External validation on Duke-Abdomen and AMOS datasets confirmed generalization across institutions without model retraining.
- →Domain-specific quantitative phenotypes enable clinician validation and bias detection compared to black-box deep learning models.
- →Open-source TotalSegmentator integration suggests broad reproducibility and potential clinical adoption pathway.