AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce CIVeX, a causal intervention verifier that validates whether tool-calling language agents' proposed actions will actually produce intended effects in real-world execution. The system achieves zero false executions under adversarial conditions and outperforms LLM-based verification approaches by ensuring causal identifiability rather than just schema validity.
🧠 Claude
AINeutralarXiv – CS AI · May 127/10
🧠Researchers identify a critical vulnerability in agentic memory systems where Large Language Models retrieve and amplify spurious correlations from stored information, leading to erroneous reasoning in downstream decisions. The study benchmarks this risk and proposes CAMEL, a lightweight calibration method that mitigates spurious pattern reliance while maintaining performance on clean data.
AINeutralarXiv – CS AI · May 117/10
🧠A research paper argues that mechanistic interpretability studies increasingly make causal claims without explicitly stating their identification assumptions, creating a credibility gap in AI research. The authors audit 10 papers across multiple methodologies and find none contain dedicated identification-assumptions sections, proposing a new disclosure norm requiring researchers to clearly state causal claims, identification strategies, and the assumptions underpinning their conclusions.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers propose CIKA, a framework using LLMs as interventional simulators to identify which mathematical concepts causally contribute to correct answers, distinguishing genuine causal relationships from spurious correlations. The method achieves 69.7% on Omni-MATH-Rule and 97.2% on GSM8K with a frozen 7B model, outperforming o1-mini on contamination-free benchmarks.
AINeutralarXiv – CS AI · May 97/10
🧠Researchers developed a causal analysis framework to audit bias in Large Language Models across seven global models, revealing that Western AI systems exhibit higher refusal rates for specific demographics while Eastern models show low intervention rates with regional sensitivities. The study demonstrates that traditional fairness metrics significantly overestimate demographic bias by conflating cultural context with model behavior, challenging current approaches to AI safety evaluation.
🧠 Llama
AIBullisharXiv – CS AI · Apr 147/10
🧠FACT-E is a new evaluation framework that uses controlled perturbations to assess the faithfulness of Chain-of-Thought reasoning in large language models, addressing the problem of models generating seemingly coherent explanations with invalid intermediate steps. By measuring both internal chain consistency and answer alignment, FACT-E enables more reliable detection of flawed reasoning and selection of trustworthy reasoning trajectories for in-context learning.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.
AINeutralarXiv – CS AI · Apr 147/10
🧠Researchers developed the first real-world benchmark for evaluating whether large language models can infer causal relationships from complex academic texts. The study reveals that LLMs struggle significantly with this task, with the best models achieving only 0.535 F1 scores, highlighting a critical gap in AI reasoning capabilities needed for AGI advancement.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers propose OrthoFormer, a new Transformer architecture that addresses causal learning limitations by embedding instrumental variable estimation directly into neural networks. The framework aims to distinguish between spurious correlations and true causal mechanisms, potentially improving AI model robustness and reliability under distribution shifts.
AINeutralarXiv – CS AI · Mar 47/104
🧠Researchers developed DICE-DML, a new framework that uses deepfake technology and machine learning to measure causal effects of visual attributes in digital advertising. The method addresses bias issues in standard approaches when analyzing how image elements like skin tone affect consumer engagement on social media platforms.
AINeutralarXiv – CS AI · Mar 37/104
🧠Researchers have developed a method to implement Pearl's causal inference framework (DO-calculus) on quantum circuits, mapping causal networks to quantum hardware through 'circuit surgery.' The approach was successfully demonstrated on IonQ's quantum processor using a healthcare model, showing agreement with classical baselines.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce a novel observational study design called confounder detection via treatment intent to address unobserved confounding in causal inference from non-randomized data. By querying expert decision-makers about treatment allocation through principled matching, the method aims to identify hidden variables affecting outcomes, with proof-of-concept demonstrated in ICU treatment analysis using clinical text notes and NLP.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Causal Parametric Drift Simulation, a framework using Structural Causal Models as digital twins to evaluate machine learning classifier robustness against concept drift in dynamic environments. The method preserves causal dependencies in tabular data and identifies vulnerabilities that conventional statistical tests miss, demonstrated on mental health datasets.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce SVAR-FM, a framework that uses physics-based simulators to discover causal relationships in time series data by treating simulation interventions as Pearl's do operator. The method recovers correct causal directions where observational methods fail due to confounding, with theoretical guarantees and empirical validation across multiple scientific domains.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers rigorously tested claims that Mamba state-space models can discover causal structure through prediction-only training, finding the method underperforms classical approaches like PCMCI and Granger causality. The apparent success in earlier experiments was largely attributable to sample-size confounds and non-standard intervention semantics rather than genuine architectural advantages.
AINeutralarXiv – CS AI · May 115/10
🧠Researchers introduce INCAMA, a novel method for inferring causal brain networks from indirect neuroimaging data like fMRI. The approach addresses the fundamental challenge that brain imaging signals are distorted by physics of hemodynamics and volume conduction, making direct causal inference impossible without accounting for these measurement artifacts.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present C3, a novel credit assignment method for cooperative multi-agent LLM systems that achieves exact causal measurement without approximation by exploiting deterministic interaction histories. The method outperforms existing baselines across six benchmarks while reducing training costs, and introduces the first method-agnostic auditing tools for evaluating multi-agent credit assignment quality.
AINeutralarXiv – CS AI · May 115/10
🧠Researchers propose a Three-in-One world-model architecture using Deep Boltzmann Machines to unify marketing decision-making by simultaneously capturing consumer heterogeneity, predicting outcomes, and enabling counterfactual reasoning about interventions. The approach outperforms existing causal inference baselines in recovering treatment effects, particularly for confounded price-promotion scenarios.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce BGM-IV, a Bayesian generative modeling framework that improves instrumental variable regression for causal inference by operating in a structured latent space rather than observed feature space. The method outperforms existing approaches in high-dimensional covariate settings while remaining competitive in classical low-dimensional scenarios, addressing a key limitation in nonlinear causal estimation.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce Causal EpiNets, a neural network framework that improves estimation of individual treatment effects using Probability of Necessity and Sufficiency bounds. The method resolves critical limitations in finite-sample estimation by guaranteeing structural constraint satisfaction and correcting extremum bias, achieving better coverage and validity than standard plug-in estimators.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce CFM-SD, a causal discovery method that leverages physical simulators to identify cause-and-effect relationships in scientific domains while handling latent confounders—a common problem in molecular design and materials science. The approach achieves significantly higher accuracy than existing methods and demonstrates practical improvements in real-world applications like toxicity prediction and battery optimization.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose Deconfounded Hierarchical Gate (DHG), a novel approach to improve physics-constrained deep generative models' ability to extrapolate beyond training conditions. The method counterintuitively finds that excluding target-domain data during pretraining improves extrapolation performance by 39%, achieving 46% better results on lithium-ion battery temperature prediction benchmarks.
AINeutralarXiv – CS AI · May 115/10
🧠Researchers present a solution for selecting cost-effective experiments to narrow uncertainty bounds on partially identifiable causal effects from observational data. They formalize this as an NP-hard optimization problem and develop pruning algorithms that eliminate 50-88% of candidate experiments without exhaustive computation, demonstrated on real epidemiological datasets.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose Factored Classifier-Free Guidance (FCFG), a new technique that improves how diffusion models generate counterfactual images by enabling attribute-specific control rather than applying uniform guidance across all features. This advancement addresses a fundamental limitation in current methods that causes unrealistic spurious changes, enhancing the accuracy of hypothetical outcome simulations in both natural and medical imaging applications.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce a Dual Causal Adjustment Network (DCAN) to improve fairness in multimodal AI systems that assess personality traits from video data. The method addresses demographic and latent biases that cause unfair predictions across different population groups, achieving 92%+ accuracy while significantly improving fairness metrics.