Explaining is Harder Than Predicting Alone: Evaluating Concept-based Explanations of MLLMs as ICL Visual Classifiers
Researchers evaluated how multimodal large language models (MLLMs) explain their image classification decisions in few-shot learning scenarios. The study found that forcing models to generate formal, concept-based explanations actually reduces their predictive accuracy from 93.8% to 90.1%, suggesting that explicit reasoning doesn't universally improve performance despite being widely assumed to do so.
This research exposes a fundamental tension in AI model design: the gap between predictive capability and interpretability. MLLMs demonstrate strong visual classification performance using in-context learning, yet struggle when required to formalize their reasoning into machine-verifiable explanations. The finding challenges a widespread assumption in AI safety and transparency communities that explicit reasoning chains improve both model accuracy and trustworthiness.
The work builds on growing skepticism about Chain-of-Thought prompting, suggesting that current explanation methods may mask rather than reveal how MLLMs actually process visual information. By testing four state-of-the-art models against increasingly rigorous explanation standards—from baseline classification through Description Logics axioms—researchers demonstrated a monotonic degradation in accuracy as formal constraints tightened. This indicates MLLMs lack the instruction-tuning necessary for generating formally structured explanations without performance penalties.
The positive correlation between explanation quality and correct predictions offers a crucial insight: when models do successfully identify class-discriminative visual features, their explanations align with accuracy. This suggests the problem isn't inherent inability to explain, but rather insufficient training for formal explainability tasks. For AI developers and enterprises deploying MLLMs in high-stakes domains requiring auditability, this research signals that current models cannot reliably provide both accurate predictions and verifiable explanations simultaneously.
The implications extend beyond academia. Organizations requiring regulatory compliance or stakeholder transparency may need to choose between raw predictive performance and formal explainability, potentially settling for lower accuracy in exchange for trustworthy, verifiable reasoning. Future MLLM development should prioritize instruction-tuning specifically for formal explanation generation rather than assuming it emerges naturally from scaling.
- →Forcing MLLMs to generate formal concept-based explanations reduces predictive accuracy from 93.8% to 90.1%, contradicting assumptions that explicit reasoning aids performance.
- →Current MLLMs lack sufficient instruction-tuning for machine-verifiable explainability despite excelling at visual classification tasks.
- →Strong correlation exists between explanation quality and correct predictions when models successfully articulate class-discriminative features.
- →Chain-of-Thought prompting may not accurately reflect internal model computation, raising questions about its reliability for interpretability.
- →Organizations requiring both high accuracy and formal explainability may face performance trade-offs with current MLLM technology.