When is 3D Worth It? A Resource-Performance Frontier for CNNs and Transformers in Lung CT
Researchers studying lung CT imaging found that 2.5D CNNs provide the best balance of performance, stability, and computational efficiency for cancer screening compared to full 3D models or pure 2D approaches. The study challenges the assumption that 3D models are universally superior for volumetric medical imaging, revealing that 3D CNNs suffer from threshold instability while transformers produce unreliable degenerate predictions.
This arXiv study addresses a fundamental assumption in medical AI: that three-dimensional representations are inherently superior for volumetric imaging tasks. The researchers conducted a rigorous controlled comparison across 1,977 lung CT samples, evaluating how model architecture and input dimensionality interact under standardized training conditions. Their finding that 2.5D CNNs outperform 3D alternatives has significant implications for medical imaging practitioners balancing model sophistication against real-world constraints.
The research emerges from growing recognition that architectural complexity doesn't automatically translate to clinical utility. As healthcare systems deploy AI diagnostics at scale, computational costs, model stability, and reproducibility matter as much as raw performance metrics. The observation that 3D models exhibit threshold instability—where performance degrades unpredictably at different decision boundaries—and that transformers produced degenerate predictions like all-positive classifications reveals critical failure modes that single-metric evaluations mask.
For the medical AI industry, this work provides evidence that resource efficiency and reliability should guide architecture selection, not theoretical assumptions about dimensionality. Engineers building cancer screening systems could reduce infrastructure costs while improving operational stability by adopting 2.5D approaches. The wide confidence intervals the authors report honestly reflect the uncertainty inherent in limited datasets, positioning their findings as a practical framework rather than definitive rankings.
Future research should examine whether these dimensionality trade-offs hold across other volumetric imaging tasks and larger datasets. Standardized comparison protocols like this one could accelerate adoption of genuinely optimal solutions rather than overcomplicated architectures. The work demonstrates that thoughtful empirical analysis, not architectural prestige, should drive medical AI deployment decisions.
- →2.5D CNNs achieved the best performance-efficiency trade-off (ROC-AUC 0.682) for lung cancer screening versus pure 2D or full 3D models
- →3D CNNs demonstrated problematic threshold instability while Vision Transformers produced degenerate predictions, indicating architectural limitations for this task
- →Wide overlapping confidence intervals suggest caution against absolute superiority claims and emphasize the importance of considering computational cost alongside accuracy
- →The study challenges the widespread assumption that 3D representations are universally preferable for volumetric medical imaging analysis
- →Practical deployment should prioritize stability, computational efficiency, and reliability over theoretical model complexity for clinical imaging systems