Comparative evaluation of training strategies using partially labelled datasets for segmentation of white matter hyperintensities and stroke lesions in FLAIR MRI
Researchers developed and evaluated six training strategies for deep learning models to segment white matter hyperintensities and stroke lesions in MRI scans using partially labeled datasets. Pseudolabeling emerged as the most effective approach, successfully leveraging 2,052 MRI volumes with incomplete annotations to create reliable automated segmentation tools for cerebral small vessel disease monitoring.
This research addresses a critical challenge in medical AI: the scarcity of fully annotated training data. White matter hyperintensities and ischemic stroke lesions present as visually similar hyperintensities on FLAIR MRI sequences, making them difficult to differentiate even for expert radiologists. The study's innovation lies in systematically evaluating how partially labeled datasets—where some images lack annotations for one or both pathologies—can train effective segmentation models, a pragmatic approach reflecting real-world medical imaging environments.
The research curated a substantial cohort combining private and public datasets, with 1,341 volumes annotated for WMH and 1,152 for ISL. By testing six different training strategies, the team identified pseudolabeling as optimal, where the model generates predictions on unlabeled data that subsequently serve as training targets. This finding has significant implications for medical AI development, where acquiring expert annotations remains expensive and time-consuming.
For the healthcare technology sector, this work demonstrates that robust clinical tools can emerge from imperfect training data, potentially accelerating AI deployment in resource-constrained settings. The methodology is generalizable to other medical imaging tasks involving multiple pathologies or incomplete annotations. The ability to extract reliable biomarkers from existing imaging archives without requiring comprehensive re-annotation could unlock substantial value in longitudinal clinical studies and large-scale epidemiological research focused on cerebral small vessel disease—a major contributor to cognitive decline and stroke.
Future developments should explore whether these strategies transfer to other imaging modalities and pathology combinations, and whether integration with clinical workflows improves diagnostic accuracy compared to radiologist assessments alone.
- →Pseudolabeling outperformed five alternative strategies for training segmentation models on partially labeled MRI data.
- →Researchers successfully leveraged 2,052 MRI volumes with incomplete annotations to build reliable automated segmentation tools.
- →The approach addresses the critical medical AI challenge of limited fully annotated datasets without sacrificing model performance.
- →Results demonstrate viable pathways for automating biomarker extraction in large-scale clinical research programs.
- →The methodology is generalizable to other medical imaging tasks with multiple pathologies or incomplete ground truth labels.