LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data
Researchers demonstrate that Large Language Models lack genuine self-awareness regarding their knowledge limitations when applied to clinical tabular data, using cross-model attribution divergence to detect epistemic blind spots. LLM confidence scores remain constant regardless of actual accuracy, while a novel cross-model calibrator achieves reliable uncertainty quantification without model access or retraining.
This research addresses a critical vulnerability in deploying LLMs to high-stakes domains like clinical decision-support. While LLMs have shown impressive capabilities across many tasks, their application to structured medical data reveals fundamental gaps in epistemic self-awareness. The study's core finding—that LLM verbalized confidence bears no relationship to prediction quality—challenges assumptions that prompting strategies alone can induce genuine uncertainty quantification.
The research builds on growing concerns about AI reliability in healthcare, where overconfidence in incorrect predictions carries tangible risks. Prior work has shown LLMs struggle with structured reasoning, but this paper goes further by demonstrating the inadequacy of existing confidence mechanisms. The inverse difficulty effect is particularly striking: LLM performance degrades precisely when compared models become more confident, suggesting the models operate on fundamentally different decision boundaries.
The practical significance lies in the proposed solution: cross-model attribution divergence provides a post-hoc calibration mechanism without requiring access to model internals or expensive retraining. This approach has immediate applicability for organizations already deploying LLMs to clinical data who need reliability estimates. The calibration error reduction from 0.254 to 0.080 represents substantial improvement in patient-specific confidence assessment.
For the broader AI development community, this work signals that confidence mechanisms must be validated empirically rather than assumed functional through design. Organizations considering LLM adoption for structured medical or financial data should implement similar divergence-based validation frameworks before deployment. The cold-start problem framing suggests LLMs require explicit calibration protocols when transitioning to new domains.
- →LLM verbalized confidence is unreliable on clinical tasks, remaining constant at 85-93% regardless of actual accuracy levels.
- →Attribution divergence between LLMs and XGBoost serves as a more trustworthy epistemic signal than model-generated confidence scores.
- →Cross-model calibrators can reduce calibration error by 69% without accessing model internals or requiring retraining.
- →Few-shot examples and SHAP feature evidence operate independently to improve both accuracy and attribution agreement.
- →LLMs exhibit inverse difficulty effects on structured data, performing worse when compared models are most certain.