FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model
FADA is a unified vision-language model that performs fetal ultrasound interpretation, detection, and segmentation through a single pipeline, addressing critical diagnostic gaps in low- and middle-income countries where sonographer shortages limit prenatal screening. The system runs on consumer hardware and smartphones entirely offline, achieving clinically validated performance metrics while requiring no external labels at inference.
FADA addresses a critical healthcare infrastructure problem: over half of pregnant women in low- and middle-income countries lack access to skilled ultrasound screening due to sonographer shortages. This research demonstrates how AI can democratize medical diagnostics by consolidating multiple specialized tasks—interpretation, classification, detection, and segmentation—into a single unified model, eliminating the traditional requirement for separate models and expert-specified labels.
The technical approach leverages selective knowledge distillation from four domain-specific foundation models, with the key innovation being interpretation-first design that prioritizes clinical usability over isolated performance metrics. The validation methodology is particularly noteworthy: expert sonographer evaluation across 237 images confirms clinically acceptable outputs, with 73.5% of interpretations scoring perfectly under clinical guidance. This human-in-the-loop validation demonstrates the authors' commitment to practical clinical deployment rather than benchmark optimization.
The edge deployment capability fundamentally changes the accessibility equation. By compressing the model to 0.8B parameters and successfully running it on commodity smartphones via GGUF quantization, FADA eliminates dependence on cloud infrastructure—a critical requirement for healthcare systems in resource-constrained regions with unreliable connectivity. The 60-second processing pipeline on standard consumer devices makes real-time clinical support feasible in resource-limited settings.
For healthcare technology development, FADA establishes a template for AI systems that prioritize accessibility alongside accuracy. The open-source release of code, models, and data enables rapid iteration and adaptation across different healthcare contexts. This approach aligns with broader trends in medical AI toward practical deployment optimization rather than marginal accuracy improvements on benchmarks.
- →FADA performs clinical ultrasound interpretation, detection, and segmentation simultaneously through one unified model, eliminating need for multiple specialized models
- →The system achieves clinically validated performance (0.8820 Dice segmentation, 0.7671 mAP detection) while running entirely offline on commodity smartphones
- →Selective distillation strategy outperforms full distillation by prioritizing interpretation tasks through standard fine-tuning rather than feature alignment
- →Expert sonographer validation confirms 73.5% perfect interpretation scores, establishing clinical acceptability for both autonomous and human-in-the-loop deployment
- →Edge deployment on Snapdragon 7 Gen 1 smartphones with 60-second pipeline timing removes cloud dependency for resource-constrained healthcare settings