#model-explainability News & Analysis

10 articles tagged with #model-explainability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · May 277/10

🧠

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

Researchers introduce MedGuideX, a medical language model trained on executable clinical decision logic extracted from practice guidelines, achieving 10.28% accuracy improvement over existing methods. The approach transforms procedural guideline structures into synthetic training data that teaches models both correct clinical decisions and counterfactual reasoning, with physician validation confirming improved explanation quality.

AIBullisharXiv – CS AI · Apr 137/10

🧠

Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models

Researchers propose a cost-effective proxy model framework that uses smaller, efficient models to approximate the interpretability explanations of expensive Large Language Models (LLMs), achieving over 90% fidelity at just 11% of computational cost. The framework includes verification mechanisms and demonstrates practical applications in prompt compression and data cleaning, making interpretability tools viable for real-world LLM development.

AINeutralarXiv – CS AI · Jun 196/10

🧠

From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models

Researchers systematically analyzed how eight large language models encode essay quality information in their hidden representations across three datasets. Using linear probing and neuron-level analysis, they found that essay quality is encoded in linearly accessible form, emerges progressively across layers, and partially transfers across different essay prompts, with individual 'essay scoring neurons' showing strong correlation to scores.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference

Researchers introduce ExtraCare, a domain adaptation method for clinical AI models that decomposes patient data into interpretable components while maintaining prediction accuracy across different healthcare datasets. The approach addresses a critical gap in healthcare AI by combining superior performance with transparent, explainable outputs—essential for clinical adoption where transparency and safety are paramount.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Explaining Black-Box Language Models: Learning to Optimize Linguistically-Structured Word Subsets

Researchers propose a novel method for explaining black-box language model predictions by identifying linguistically-structured word subsets without requiring access to internal model parameters or gradients. The approach uses reinforcement learning and graph-based linguistic knowledge to generate interpretable, efficient explanations that outperform existing methods across multiple architectures and datasets.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Comprehensive and Reliable Feature Attribution for Diverse Modalities and Models via Frequency-Domain Insights

Researchers introduce FreqX, a novel interpretability method for machine learning models that leverages signal processing and information theory to address challenges in personalized federated learning. The approach achieves 10x faster performance than existing methods while providing both attribution and concept information while maintaining privacy.

AIBullisharXiv – CS AI · Jun 26/10

🧠

Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

Researchers propose a novel approach combining cellular sheaves with attention-based multiple instance learning to improve interpretability in weakly-supervised pathology image classification. The method achieves 0.940 patch-level AUC on Camelyon16 and successfully aligns attention maps with diagnostic regions, addressing a critical gap where models classify correctly without focusing on actual lesions.

AINeutralarXiv – CS AI · May 286/10

🧠

Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

Researchers propose a new interpretation method for Transformer models with heterogenous attention structures, which process information from multiple sources. The work addresses the growing need to understand complex AI systems, particularly as they integrate diverse data modalities and support increasingly sophisticated agent applications.

AINeutralarXiv – CS AI · May 276/10

🧠

How Reliable are LLMs for Reasoning on the Re-ranking task?

Researchers investigate whether Large Language Models reliably perform re-ranking tasks by analyzing how different training methods affect semantic understanding and reasoning transparency. The study reveals that some training approaches produce better explainability than others, suggesting LLMs may optimize for evaluation metrics rather than genuine semantic comprehension, raising concerns about their actual reliability in ranking applications.

AINeutralarXiv – CS AI · Apr 206/10

🧠

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

Researchers conducted a comparative study of how large language models trained with different fine-tuning methods (full fine-tuning, LoRA, and quantized LoRA) interpret code compliance tasks. The study reveals that full fine-tuning produces more focused attribution patterns than parameter-efficient methods, and larger models develop distinct interpretive strategies despite performance gains plateauing above 7B parameters.