AIBullisharXiv – CS AI · 6h ago7/10
🧠
Confidence Calibration for Multimodal LLMs: An Empirical Study through Medical VQA
Researchers demonstrate that multimodal large language models (MLLMs) struggle with confidence calibration in medical tasks, where their stated confidence often misaligns with actual accuracy. A new method combining Multi-Strategy Fusion-Based Interrogation with expert LLM assessment reduces calibration error by 40% across medical VQA datasets, addressing critical reliability concerns for AI-assisted diagnosis.