AIBullisharXiv – CS AI · Jun 17/10
🧠Researchers have proven that Shapley values, a key framework for attribution in machine learning, depend exclusively on the odd component of set functions. This theoretical breakthrough justifies the effectiveness of paired sampling and enables OddSHAP, a new estimator that achieves state-of-the-art accuracy by performing regression solely on the odd subspace using Fourier basis decomposition.
AINeutralarXiv – CS AI · May 127/10
🧠Researchers introduce causal dimensionality (kappa), a measurable property quantifying how transformer layers causally influence model outputs, finding that representational capacity grows 15.6x faster than causal capacity across scaling conditions. The metric remains invariant to model size increases, suggesting causal influence is a fundamental architectural property independent of parameter count.
AINeutralarXiv – CS AI · Mar 97/10
🧠Researchers demonstrate that traditional explainable AI methods designed for static predictions fail when applied to agentic AI systems that make sequential decisions over time. The study shows attribution-based explanations work well for static tasks but trace-based diagnostics are needed to understand failures in multi-step AI agent behaviors.
AINeutralarXiv – CS AI · Feb 277/105
🧠Researchers propose Geodesic Integrated Gradients (GIG), a new method for explaining AI model decisions that uses curved paths instead of straight lines to compute feature importance. The method addresses flawed attributions in existing approaches by integrating gradients along geodesic paths under a model-induced Riemannian metric.
AINeutralarXiv – CS AI · Jun 16/10
🧠Researchers propose a framework to attribute AI model behavior to specific development stages (pretraining, fine-tuning, alignment), enabling accountability tracking without model retraining. The method quantifies how each stage contributes to model outputs and can identify spurious correlations, advancing transparency in AI development.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers propose π-Soft-NC and π-Soft-NS, improved evaluation metrics for assessing input attribution methods in large language models that control for the number of retained words, addressing a fundamental bias in existing faithfulness evaluation frameworks. They also introduce Grad-ELLM, a gradient-based attribution method designed for decoder-only LLMs that combines gradient and attention mechanisms for stronger explanatory performance.
🧠 Llama
AINeutralarXiv – CS AI · May 126/10
🧠Researchers have developed attribution techniques that explain decision-making in Markov Decision Processes (MDPs), extending explainability methods beyond static inputs to sequential decision-making systems. The approach assigns importance scores to states and execution paths, enabling more interpretable AI agents in dynamic environments.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce FAMPE, a novel attribution method that uses frequency-domain analysis to improve explainability in deep neural networks. By separately perturbing high and low-frequency components through FFT-based techniques, the method outperforms existing attribution approaches on ImageNet across multiple architectures without requiring manual baseline selection.