#model-interpretability News & Analysis

31 articles tagged with #model-interpretability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

31 articles

AIBullisharXiv – CS AI · Mar 36/106

🧠

GlassMol: Interpretable Molecular Property Prediction with Concept Bottleneck Models

Researchers introduce GlassMol, a new interpretable AI model for molecular property prediction that addresses the black-box problem in drug discovery. The model uses Concept Bottleneck Models with automated concept curation and LLM-guided selection, achieving performance that matches or exceeds traditional black-box models across thirteen benchmarks.

AINeutralarXiv – CS AI · Mar 36/104

🧠

Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs

Researchers investigated whether large language models can introspect by detecting perturbations to their internal states using Meta-Llama-3.1-8B-Instruct. They found that while binary detection methods from prior work were flawed due to methodological artifacts, models do show partial introspection capabilities, localizing sentence injections at 88% accuracy and discriminating injection strengths at 83% accuracy, but only for early-layer perturbations.

AIBearisharXiv – CS AI · Mar 36/103

🧠

GNN Explanations that do not Explain and How to find Them

Researchers have identified critical failures in Self-explainable Graph Neural Networks (SE-GNNs) where explanations can be completely unrelated to how the models actually make predictions. The study reveals that these degenerate explanations can hide the use of sensitive attributes and can emerge both maliciously and naturally, while existing faithfulness metrics fail to detect them.

AINeutralarXiv – CS AI · Mar 174/10

🧠

Informative Perturbation Selection for Uncertainty-Aware Post-hoc Explanations

Researchers introduce EAGLE, a new framework for explaining black-box machine learning models using information-theoretic active learning to select optimal data perturbations. The method produces feature importance scores with uncertainty estimates and demonstrates improved explanation reproducibility and stability compared to existing approaches like LIME.

AINeutralarXiv – CS AI · Mar 25/108

🧠

Hierarchical Concept-based Interpretable Models

Researchers introduce Hierarchical Concept Embedding Models (HiCEMs), a new approach to make deep neural networks more interpretable by modeling relationships between concepts in hierarchical structures. The method includes Concept Splitting to automatically discover fine-grained sub-concepts without additional annotations, reducing the burden of manual labeling while improving model accuracy and interpretability.

AIBullisharXiv – CS AI · Mar 24/107

🧠

Joint Distribution-Informed Shapley Values for Sparse Counterfactual Explanations

Researchers introduce COLA, a framework that refines counterfactual explanations in AI models by using optimal transport theory and Shapley values to achieve the same prediction changes with 26-45% fewer feature modifications. The method works across different datasets and models to create more actionable and clearer AI explanations.

$NEAR

← PrevPage 2 of 2