Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models
Researchers introduce Omni-Geometry Knowledge Distillation (OGKD), a framework that improves vision-language model adaptation for medical imaging by respecting clinically meaningful class relationships rather than treating non-ground-truth classes equally. The method achieves 1.7%-2.8% accuracy improvements over prior approaches across 11 medical datasets while generalizing better to unseen classes.
OGKD addresses a fundamental limitation in current prompt and adapter-based tuning of vision-language models applied to medical imaging. Traditional methods treat prediction errors uniformly across non-target classes, ignoring the semantic structure of medical diagnoses where certain misclassifications carry greater clinical significance than others. This geometric-aware approach injects inter-class relationships into the distillation process, creating more stable decision boundaries crucial for medical applications operating under limited annotation constraints.
The medical imaging domain faces unique constraints: patient data sensitivity necessitates frozen backbone models to preserve privacy, while annotation scarcity limits supervised learning. Prompt-based tuning emerged as an attractive solution, but its optimization strategy proved ineffective at leveraging domain knowledge about class relationships. OGKD's dual-mechanism approach—Global Geometry-Aware Distillation for overall image representation and Label-Guided Geometry Distillation for fine-grained patch alignment—directly addresses this gap through directional knowledge targets that preserve accuracy while respecting clinical taxonomy.
For healthcare AI development, this work demonstrates measurable improvements in both known and zero-shot class scenarios, with more reliable predictions than competing methods. The consistent 1.7%-2.8% accuracy gains across diverse medical datasets validate the approach's robustness. This matters to developers building diagnostic tools and institutions deploying AI in clinical settings, where marginal accuracy improvements can translate to meaningful diagnostic improvements and where generalization to novel conditions reflects real-world clinical applicability.
The framework's availability as open-source code accelerates adoption across medical imaging research. Future development should explore whether geometry-aware distillation principles extend beyond medical applications to other domains with structured class taxonomies, potentially influencing broader VLM adaptation methodologies.
- →OGKD framework improves medical imaging VLM adaptation by 1.7%-2.8% by respecting clinically meaningful class relationships
- →Dual distillation mechanism combines global image tokens with attentive patch-level alignment for fine-grained prediction
- →Method demonstrates superior generalization to unseen disease classes compared to existing prompt-tuning approaches
- →Framework addresses medical AI constraints including data sensitivity, limited annotations, and frozen backbone requirements
- →Open-source implementation available, enabling rapid adoption across medical imaging research and development