FaCT: Faithful Concept Traces for Explaining Neural Network Decisions
Researchers introduce FaCT, a new approach for explaining neural network decisions through faithful concept-based explanations that don't rely on restrictive assumptions about how models learn. The method includes a new evaluation metric (C²-Score) and demonstrates improved interpretability while maintaining competitive performance on ImageNet.
The challenge of interpreting deep neural networks has become increasingly critical as these models influence high-stakes decisions across industries. FaCT addresses a fundamental limitation in existing concept-based explanation methods: their lack of faithfulness to actual model behavior. Prior approaches often impose artificial constraints, assuming concepts must align with human intuition, remain spatially localized, or correspond to specific classes. This disconnect between explanations and reality undermines trust in the models themselves.
The development reflects a broader shift in AI research toward mechanistic interpretability. Rather than retrofitting human-understandable concepts onto models post-hoc, FaCT builds concept tracing directly into the model architecture, allowing researchers to track how concepts contribute to specific predictions at any layer. This layer-agnostic approach provides genuine insight into computational processes rather than approximations.
The introduction of the C²-Score metric, leveraging foundation models for evaluation, establishes a more objective standard for assessing explanation quality. This addresses a long-standing problem: evaluating whether concept-based methods actually explain what the model does versus what humans expect it to do.
For practitioners deploying deep networks in critical domains—medical imaging, autonomous systems, financial prediction—faithful explanations directly reduce liability and improve debugging. For researchers, this work advances the emerging field of mechanistic interpretability, moving beyond black-box descriptions toward genuine understanding of neural computation. The maintenance of competitive ImageNet performance suggests these improvements don't sacrifice accuracy, making adoption more feasible.
- →FaCT provides faithful concept-level explanations without restrictive assumptions about class-specificity or spatial extent
- →The C²-Score metric enables objective evaluation of concept-based explanation methods using foundation models
- →Concepts can be traced from any network layer to understand their contribution to predictions
- →Users rated FaCT concepts as more interpretable while retaining competitive ImageNet performance
- →This approach enables mechanistic understanding of deep networks rather than post-hoc approximations