#interpretability News & Analysis

318 articles tagged with #interpretability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

318 articles

AINeutralarXiv – CS AI · Jun 96/10

🧠

SAILS: Surrogate-based Analysis of Interactions via Local Effect Smooths

Researchers introduce SAILS, a model-agnostic framework that goes beyond detecting feature interactions in machine learning models to reveal their functional forms and characteristics. Using surrogate generalized additive models, SAILS categorizes interactions as linear, product-separable, or non-product-separable and provides tailored visualizations, advancing the field of explainable AI.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

Researchers propose a fine-tuned speech language model that provides both multi-level L2 English proficiency assessment and natural-language explanations for its predictions. The model demonstrates competitive performance on standard benchmarks while offering improved interpretability, though generated rationales show lower reliability at granular word-level assessments.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

Researchers propose a methodology for validating attention-head circuits in large language models by combining co-activation clustering with causal ablation testing. Their findings reveal that while clustering signals identify circuit proposals, true circuit validation requires closure tests that measure functional impact through ablation—a distinction that challenges current interpretability approaches.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Geometric Unification of Concept Learning with Concept Cones

Researchers demonstrate that Concept Bottleneck Models and Sparse Autoencoders, two distinct interpretability approaches in machine learning, share an underlying geometric structure based on concept cones. This unification enables quantitative evaluation of how well unsupervised concept discovery aligns with human-defined concepts, advancing AI interpretability standards.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Hyperflux: Pruning Reveals Importance

Researchers introduce Hyperflux, a novel L0 pruning method that models neural network pruning as a dynamically evolving system driven by flux and pressure mechanisms. The approach provides interpretability at multiple scales while achieving competitive sparsity results on standard vision benchmarks, advancing understanding of how neural networks can be efficiently compressed.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Understanding Benchmark Language Under Weakened Formal Semantics

Researchers propose a method to improve NLP benchmark understanding by extracting executable representations (computables) that provide operational evidence of semantic adequacy beyond traditional text-based reasoning. The approach demonstrates consistent improvements over baseline methods across mathematical reasoning, legal, and biomedical benchmarks while offering inspectable semantic evidence.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Unambiguous Representations in Neural Networks: An Information-Theoretic Approach to Intentionality

Researchers introduce an information-theoretic framework to measure representational ambiguity in neural networks, demonstrating that network connectivity structures can encode unambiguous content independent of behavioral performance. Using MNIST classification experiments, they achieve 100% accuracy in identifying output neuron class identity from relational structure alone in dropout-trained networks, suggesting neural systems can exhibit the low-ambiguity representations theorized as necessary for consciousness.

AINeutralarXiv – CS AI · Jun 85/10

🧠

Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition

Researchers demonstrate that instruction-following audio language models can effectively utilize explicit acoustic cues for speech emotion recognition, with aligned acoustic tokens improving performance on standard benchmarks while remaining grounded in the underlying audio signal.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Endogenous Resistance to Activation Steering in Language Models

Researchers demonstrate that large language models exhibit Endogenous Steering Resistance (ESR), the ability to detect and recover from activation-space steering attempts mid-generation, with Llama-3.3-70B showing explicit resistance in over half of cases. The discovery reveals both a potential safety feature against adversarial manipulation and a complication for beneficial steering-based interventions, since models cannot distinguish between malicious and helpful steering.

🧠 Llama

AINeutralarXiv – CS AI · Jun 86/10

🧠

Evidence-Based Intelligent Diagnostic and Therapeutic Visualization System with Large Language Models: Multi-Turn Interaction and Multimodal Treatment Plan Generation

Researchers developed an AI-enhanced diagnostic system for traditional Chinese medicine that combines Neo4j knowledge graphs, large language models, and multimodal visualization to improve diagnostic transparency and treatment planning. The system demonstrated a 32% reduction in non-standard outputs and significantly improved diagnostic trust and credibility compared to existing tools.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Beyond Post-hoc Explanation: Toward Glassbox AI via Probabilistic Mediation

Researchers propose the Glassbox Framework, a new AI architecture that replaces post-hoc explainability with ante-hoc probabilistic mediation using Bayesian networks as transparent reasoning layers for large language models. This approach aims to make AI systems fundamentally accountable in high-stakes domains like healthcare, law, and public administration by encoding domain knowledge and causal assumptions before inference occurs.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Attention Consistent Longitudinal Medical Visual Question Answering Guided by Vision Foundation Models

Researchers propose a novel attention-guided encoder-decoder architecture for longitudinal medical visual question answering using chest X-rays, incorporating affine registration and vision foundation models (DINO) to identify anatomical changes over time. The approach combines saliency masking with multimodal transformer decoding and auxiliary learning objectives, achieving strong benchmark performance while providing interpretable visual explanations for clinical reasoning.

AINeutralarXiv – CS AI · Jun 86/10

🧠

MSAIC-Net: A Multi-Scale Attention and Imbalance-Aware Contrastive Network for ECG-Based Myocardial Substrate Abnormality Detection

Researchers present MSAIC-Net, a deep learning framework that improves ECG-based detection of myocardial substrate abnormalities like scarring and heart attacks. The model combines multi-scale attention mechanisms with contrastive learning to address class imbalance and interpretability challenges, demonstrating strong performance on both institutional and public datasets.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Modeling Nonlinear Feature Interactions with Product-Unit Residual Networks

Researchers introduce Product-Unit Residual Networks (PURe), a neural architecture that explicitly models nonlinear feature interactions through multiplicative units combined with residual connections. The approach demonstrates improved interpretability, robustness to noise, and sample efficiency compared to standard MLPs across synthetic and real-world datasets.