AINeutralarXiv – CS AI · May 116/10
🧠Researchers identify why Graph Neural Network explanations produce inconsistent results when re-applied to their own outputs, attributing this to context perturbation during re-explanation. They propose Self-Denoising, a training-free post-processing method that improves explanation quality with minimal computational overhead.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce DTSemNet, a novel neural network representation of oblique decision trees that enables approximation-free gradient-based training for both classification and regression tasks. The approach eliminates reliance on softening or quantized gradients, achieving superior performance on benchmark datasets and expanding decision tree applicability to reinforcement learning environments.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce DT-PBO, a tree-based surrogate model for Preferential Bayesian Optimization that prioritizes interpretability over traditional Gaussian Process approaches. The method achieves competitive performance on benchmark functions while providing transparent insights into decision-maker preferences, addressing critical needs in high-stakes domains like healthcare.
$MKR
AINeutralarXiv – CS AI · May 116/10
🧠Researchers challenge recent claims that Chain-of-Thought (CoT) reasoning in language models is unfaithful when it omits prompt-injected hints. The study argues the Biasing Features metric conflates incompleteness with unfaithfulness, and demonstrates through multiple evaluation approaches that non-verbalized hints can still causally influence predictions, suggesting token constraints rather than model deception explain missing hint mentions.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce Mixture-of-Masters (MoM), a sparse mixture-of-experts chess language model that routes moves through specialized GPT experts trained on individual grandmaster playing styles. The system outperforms dense transformer baselines and maintains interpretability by dynamically selecting which grandmaster persona to channel based on game state.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce Annotator Policy Models (APMs), interpretable machine learning models that extract and visualize annotators' implicit safety policies from labeling behavior alone. By revealing disagreement sources—operational failures, policy ambiguity, and value pluralism—APMs enable more transparent and inclusive AI safety policy design without requiring costly additional annotation.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce DataDignity, a new framework for attributing large language model outputs to specific training documents. The study presents FakeWiki, a benchmark of 3,537 fabricated Wikipedia articles designed to test provenance tracking, and proposes ScoringModel, a supervised contrastive ranker that improves document attribution accuracy from 35% to 52.2% recall compared to existing baselines.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers propose a top-down approach to automatic heuristic design for combinatorial optimization using large language models, where interpretable knowledge becomes the primary search object rather than executable code. This knowledge-first paradigm improves discovery efficiency and generalization across problems compared to traditional code-centric methods, suggesting future progress in AI-driven optimization depends on building reusable, explicit hypotheses.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers propose concept-based abductive and contrastive explanations that identify minimal sets of high-level concepts causally relevant to vision model predictions. The approach combines human-interpretable concept-based explanations with formal causal reasoning, enabling better understanding of both individual predictions and common model behaviors across image collections.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce CatNet, an algorithm that controls False Discovery Rate (FDR) in LSTM neural networks by combining SHAP feature importance derivatives with a Gaussian Mirror statistical approach. The method addresses overfitting and model interpretability challenges in time-series deep learning through improved feature selection and a novel kernel-based independence measure.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains machine learning models to balance accuracy with explainability by encoding feature importance hierarchies as directed acyclic graphs and using Temporal Integrated Gradients to measure feature contributions. The approach provides statistical guarantees for model interpretability while maintaining convergence properties.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers establish formal connections between Boltzmann machines used in machine learning and Feynman path integrals from quantum mechanics, demonstrating that hidden neural network layers function as discrete path elements. This theoretical bridge enables new quantum circuit models and interpretability methods for machine learning systems by leveraging quantum mechanical principles.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers applied sparse autoencoders to a clinical sequence model trained on electronic health records, revealing how the model abstracts medical information across layers. While SAE features outperformed dense representations for mortality prediction in full-sequence settings, dense representations proved superior in clinically relevant scenarios with temporal constraints, suggesting interpretability gains may not translate to practical clinical improvements.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers introduce Gyan, a non-transformer language model designed to address hallucinations, interpretability, and computational inefficiency in current LLMs. The architecture decouples language modeling from knowledge acquisition and achieves state-of-the-art performance while prioritizing explainability and trustworthiness for mission-critical applications.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers propose a novel rule-generation approach to evaluate compositionality in large language models, addressing critical limitations in existing assessment methods that lack explainability and suffer from dataset partition leakage. This new framework requires LLMs to generate executable programs as rules for data mapping, providing more robust insights into how well these models generalize compositional concepts.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers introduce a lightweight LLM agent architecture that uses first- and second-order state dynamics to model gradual clinical concern escalation rather than abrupt threshold-based responses. The approach makes AI decision-making more transparent by revealing sustained risk signals before escalation, enabling better human oversight in clinical settings.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers have published a comprehensive survey on Physical AI that bridges the gap between physical perception and symbolic physics reasoning in AI systems. The work advocates for next-generation world models that integrate physical laws, embodied reasoning, and generative approaches to create AI systems with genuine understanding of physical phenomena rather than pure pattern recognition.
AIBullisharXiv – CS AI · May 16/10
🧠Researchers introduce GAVEL, a rule-based activation monitoring framework that enhances large language model safety by modeling neural activations as interpretable cognitive elements rather than broad behavioral classifiers. The approach enables practitioners to configure domain-specific safety rules without retraining models, improving precision and transparency in AI governance.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers present a novel method combining Large Language Models and Knowledge Graphs to enhance the interpretability of Machine Learning models in manufacturing environments. The approach stores domain-specific data and ML results in a structured knowledge graph, then uses an LLM to generate user-friendly explanations of ML predictions, demonstrating practical applicability in real-world manufacturing decision-making.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers propose LatentRefusal, a safety mechanism for LLM-based text-to-SQL systems that detects unanswerable queries by analyzing intermediate hidden activations rather than relying on output-level instruction following. The approach achieves 88.5% F1 score across four benchmarks while adding minimal computational overhead, addressing a critical deployment challenge in AI systems that generate executable code.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers introduce FaCT, a new approach for explaining neural network decisions through faithful concept-based explanations that don't rely on restrictive assumptions about how models learn. The method includes a new evaluation metric (C²-Score) and demonstrates improved interpretability while maintaining competitive performance on ImageNet.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers propose Filtered Reasoning Score (FRS), a new evaluation metric that assesses the quality of reasoning in large language models beyond simple accuracy metrics. FRS focuses on the model's most confident reasoning traces, evaluating dimensions like faithfulness and coherence, revealing significant performance differences between models that appear identical under traditional accuracy benchmarks.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers propose a semantic bootstrapping framework that transfers knowledge from large language models into interpretable symbolic Tsetlin Machines, enabling text classification systems to achieve BERT-comparable performance while remaining fully transparent and computationally efficient without runtime LLM dependencies.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce Hubble, an LLM-driven framework that automates alpha factor discovery in quantitative finance by using large language models constrained by safety mechanisms to generate and refine predictive trading factors. The system achieved a composite score of 0.827 across 181 evaluated factors on U.S. equities, demonstrating that combining AI-driven generation with deterministic safety constraints enables interpretable and reproducible factor discovery.
AINeutralarXiv – CS AI · Apr 146/10
🧠A new arXiv paper argues that AI alignment cannot rely solely on stated principles because their real-world application requires contextual judgment and interpretation. The research shows that a significant portion of preference-labeling data involves principle conflicts or indifference, meaning principles alone cannot determine decisions—and these interpretive choices often emerge only during model deployment rather than in training data.