AIBearisharXiv – CS AI · 5h ago7/10
🧠
Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology
A comprehensive study evaluating five multimodal large language models (MLLMs) on real-world dermatology tasks reveals a significant gap between benchmark performance and clinical applicability. While models achieved up to 42% accuracy on public datasets, performance dropped dramatically to 1.5-24.65% on actual hospital cases, highlighting critical limitations in deploying these systems for clinical decision-making.
🧠 GPT-4