#causal-inference News & Analysis

98 articles tagged with #causal-inference. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

98 articles

AIBearisharXiv – CS AI · Jun 11🔥 8/10

🧠

The Impossibility of Eliciting Latent Knowledge

Researchers prove an impossibility theorem demonstrating that no feedback-based training strategy can guarantee an AI system will honestly report its beliefs about hidden variables, even with perfect training feedback. The work formalizes the eliciting latent knowledge (ELK) problem using Causal Influence Diagrams, revealing a fundamental challenge in AI alignment where systems may learn to provide answers humans would evaluate as true rather than genuinely honest answers.

AIBullisharXiv – CS AI · Jun 257/10

🧠

OncoSynth: Synthetic data generation for treatment effect estimation in oncology

OncoSynth introduces a causally-aware machine learning framework that generates high-fidelity synthetic patient cohorts for oncology research, reducing treatment effect estimation errors by up to 66% at the population level. The framework addresses critical limitations in healthcare data sharing by preserving causal relationships between covariates, treatments, and outcomes, enabling reliable precision medicine research without requiring direct access to restricted patient data.

AIBullisharXiv – CS AI · Jun 257/10

🧠

CauScale: Neural Causal Discovery at Scale

CauScale is a neural architecture that dramatically advances causal discovery—a critical capability for scientific AI and data analysis—by enabling efficient processing of graphs with up to 1,000 nodes. The system achieves 99.6% accuracy on standard benchmarks while delivering 4-13,000x faster inference than existing methods, solving long-standing computational bottlenecks that previously limited causal discovery to smaller datasets.

AIBullisharXiv – CS AI · Jun 237/10

🧠

ARIA: A Causal-Aware Framework for Rescuing LLM Reasoning in Trustworthy Materials Discovery

Researchers introduce ARIA, a causal-aware framework that improves how Large Language Models reason about materials discovery by addressing 'contextual tunneling'—a bias where models over-rely on narrow retrieved evidence. ARIA uses a three-tier approach combining direct causal reasoning, physics-informed analogies, and parametric fallbacks, validated on a knowledge graph of 2,839 materials relations, enabling more trustworthy and auditable AI-assisted scientific discovery.

AINeutralarXiv – CS AI · Jun 237/10

🧠

Beyond Simpson's Paradox: A Cascade of Confounders in AI Agent Pull-Request Co-Authorship

A rigorous analysis of AI coding agents reveals that apparent benefits of human co-authorship in pull requests disappear under proper statistical controls, demonstrating how Simpson's Paradox and confounding variables can mask true causal relationships in AI agent research.

🏢 Microsoft🧠 Claude

AINeutralarXiv – CS AI · Jun 117/10

🧠

WorldReasoner: Evaluating Whether Language Model Agents Forecast Events with Valid Reasoning

Researchers introduce WorldReasoner, an evaluation framework that assesses whether language model agents can genuinely forecast real-world events through valid reasoning rather than memorization or fabrication. The framework evaluates forecasts across three dimensions—outcome accuracy, evidence quality, and causal reasoning—using 345 resolved tasks built from over 14,000 articles, revealing that agents struggle to convert grounded evidence into properly calibrated probabilities despite improvements in temporally valid retrieval.

AIBearisharXiv – CS AI · Jun 87/10

🧠

Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles

A research study compares how human annotators and large language models (GPT-4o-mini, Llama-3.3-70B) assign political ideology labels to news articles, finding that fine-tuned GPT-4o-mini models develop spurious correlations between sentiment and ideology that don't exist in human judgment. This reveals a critical vulnerability in using LLM annotations as training data for downstream tasks.

🧠 GPT-4🧠 Llama

AINeutralarXiv – CS AI · Jun 57/10

🧠

A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR

Researchers present a pre-registered causal decomposition framework that reveals how reinforcement learning from verifiable rewards (RLVR) conflates self-consistency elicitation with genuine reward-design effects. Through controlled experiments, they demonstrate that naive performance metrics systematically overestimate reward-design impact by 50-95%, with elicitation dominating in weak-prior regimes. The work provides diagnostic tools to audit published alignment research and expose methodological confounds.

AINeutralarXiv – CS AI · Jun 57/10

🧠

Output Type Before Quality: A Standards-Derived XAI Admissibility Rubric for Autonomous-Driving Safety

Researchers identify a critical gap between safety standards for autonomous driving and explainable AI (XAI) methods: current popular XAI techniques like SHAP produce outputs that don't match the evidence types required by ISO and safety standards. The study derives 19 evidentiary criteria across 7 lifecycle stages and determines that causal XAI methods are structurally necessary for hazard identification and incident investigation, while correlational methods suffice elsewhere.

AINeutralarXiv – CS AI · Jun 17/10

🧠

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

Researchers introduce the Causal Sensitivity Score (CSS), an interventional metric that evaluates clinical AI systems by mutating patient case variables to test whether models appropriately adjust recommendations. Testing reveals that six frontier LLMs rank nearly opposite to coverage-based benchmarks, with one model excelling at CSS while performing worst on traditional metrics, exposing a universal safety blind spot where all models fail on surgery-status changes.

AINeutralarXiv – CS AI · May 127/10

🧠

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

Researchers identify a critical vulnerability in agentic memory systems where Large Language Models retrieve and amplify spurious correlations from stored information, leading to erroneous reasoning in downstream decisions. The study benchmarks this risk and proposes CAMEL, a lightweight calibration method that mitigates spurious pattern reliance while maintaining performance on clean data.

AIBullisharXiv – CS AI · May 127/10

🧠

CIVeX: Causal Intervention Verification for Language Agents

Researchers introduce CIVeX, a causal intervention verifier that validates whether tool-calling language agents' proposed actions will actually produce intended effects in real-world execution. The system achieves zero false executions under adversarial conditions and outperforms LLM-based verification approaches by ensuring causal identifiability rather than just schema validity.

🧠 Claude

AIBullisharXiv – CS AI · May 117/10

🧠

Mathematical Reasoning via Intervention-Based Time-Series Causal Discovery Using LLMs as Concept Mastery Simulators

Researchers propose CIKA, a framework using LLMs as interventional simulators to identify which mathematical concepts causally contribute to correct answers, distinguishing genuine causal relationships from spurious correlations. The method achieves 69.7% on Omni-MATH-Rule and 97.2% on GSM8K with a frozen 7B model, outperforming o1-mini on contamination-free benchmarks.

AINeutralarXiv – CS AI · May 117/10

🧠

Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

A research paper argues that mechanistic interpretability studies increasingly make causal claims without explicitly stating their identification assumptions, creating a credibility gap in AI research. The authors audit 10 papers across multiple methodologies and find none contain dedicated identification-assumptions sections, proposing a new disclosure norm requiring researchers to clearly state causal claims, identification strategies, and the assumptions underpinning their conclusions.

AINeutralarXiv – CS AI · May 97/10

🧠

The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias

Researchers developed a causal analysis framework to audit bias in Large Language Models across seven global models, revealing that Western AI systems exhibit higher refusal rates for specific demographics while Eastern models show low intervention rates with regional sensitivities. The study demonstrates that traditional fairness metrics significantly overestimate demographic bias by conflating cultural context with model behavior, challenging current approaches to AI safety evaluation.

🧠 Llama

AIBullisharXiv – CS AI · Apr 147/10

🧠

FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

FACT-E is a new evaluation framework that uses controlled perturbations to assess the faithfulness of Chain-of-Thought reasoning in large language models, addressing the problem of models generating seemingly coherent explanations with invalid intermediate steps. By measuring both internal chain consistency and answer alignment, FACT-E enables more reliable detection of flawed reasoning and selection of trustworthy reasoning trajectories for in-context learning.

AINeutralarXiv – CS AI · Apr 147/10

🧠

Can Large Language Models Infer Causal Relationships from Real-World Text?

Researchers developed the first real-world benchmark for evaluating whether large language models can infer causal relationships from complex academic texts. The study reveals that LLMs struggle significantly with this task, with the best models achieving only 0.535 F1 scores, highlighting a critical gap in AI reasoning capabilities needed for AGI advancement.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Zero-shot World Models Are Developmentally Efficient Learners

Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.

AIBullisharXiv – CS AI · Mar 177/10

🧠

OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

Researchers propose OrthoFormer, a new Transformer architecture that addresses causal learning limitations by embedding instrumental variable estimation directly into neural networks. The framework aims to distinguish between spurious correlations and true causal mechanisms, potentially improving AI model robustness and reliability under distribution shifts.

AINeutralarXiv – CS AI · Mar 47/104

🧠

Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach

Researchers developed DICE-DML, a new framework that uses deepfake technology and machine learning to measure causal effects of visual attributes in digital advertising. The method addresses bias issues in standard approaches when analyzing how image elements like skin tone affect consumer engagement on social media platforms.

AINeutralarXiv – CS AI · Mar 37/104

🧠

Implementing Pearl's $\mathcal{DO}$-Calculus on Quantum Circuits: A Simpson-Type Case Study on NISQ Hardware

Researchers have developed a method to implement Pearl's causal inference framework (DO-calculus) on quantum circuits, mapping causal networks to quantum hardware through 'circuit surgery.' The approach was successfully demonstrated on IonQ's quantum processor using a healthcare model, showing agreement with classical baselines.

AIBullisharXiv – CS AI · Jun 256/10

🧠

Beyond Shapley: Efficient Computation of Asymmetric Shapley Values

Researchers present novel algorithms for computing Asymmetric Shapley Values (ASV), a machine learning explainability method that integrates causal knowledge. The work demonstrates polynomial-time computation in contexts where standard SHAP is #P-hard, with specialized algorithms for tree-structured causal graphs and approximation techniques for general directed acyclic graphs.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Fed-CausalDiff: Decoupled Synchronization for Federated Do-Simulation and Policy Evaluation

Fed-CausalDiff introduces a federated learning framework that enables causal inference and policy evaluation across decentralized data sources by separating global causal mechanisms from local confounders. The approach improves accuracy in treatment effect estimation and policy value calculation while reducing communication overhead, addressing a fundamental limitation of standard federated learning methods that cannot handle interventional scenarios.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Root Cause Analysis with Latent Confounders using Partial Ancestral Graphs

Researchers introduce PAG-RCA, a framework for root cause analysis in complex systems that accounts for unobserved latent variables using Partial Ancestral Graphs. The methodology combines causal identification with partial identification bounds to diagnose system failures reliably even when data is scarce or incomplete, outperforming existing approaches on synthetic and real-world infrastructure benchmarks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

ACTIVA: Amortized Causal Effect Estimation via Transformer-based Variational Autoencoder

Researchers introduce ACTIVA, a transformer-based variational autoencoder designed to estimate causal interventional distributions from observational data without requiring intervention datasets. The model amortizes causal knowledge across tasks, enabling zero-shot inference and outperforming existing baselines on synthetic and biological datasets while reducing spurious correlations.

Page 1 of 4Next →