AINeutralarXiv – CS AI · May 127/10
🧠Researchers identify a critical vulnerability in agentic memory systems where Large Language Models retrieve and amplify spurious correlations from stored information, leading to erroneous reasoning in downstream decisions. The study benchmarks this risk and proposes CAMEL, a lightweight calibration method that mitigates spurious pattern reliance while maintaining performance on clean data.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce CIVeX, a causal intervention verifier that validates whether tool-calling language agents' proposed actions will actually produce intended effects in real-world execution. The system achieves zero false executions under adversarial conditions and outperforms LLM-based verification approaches by ensuring causal identifiability rather than just schema validity.
🧠 Claude
AINeutralarXiv – CS AI · May 117/10
🧠A research paper argues that mechanistic interpretability studies increasingly make causal claims without explicitly stating their identification assumptions, creating a credibility gap in AI research. The authors audit 10 papers across multiple methodologies and find none contain dedicated identification-assumptions sections, proposing a new disclosure norm requiring researchers to clearly state causal claims, identification strategies, and the assumptions underpinning their conclusions.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers propose CIKA, a framework using LLMs as interventional simulators to identify which mathematical concepts causally contribute to correct answers, distinguishing genuine causal relationships from spurious correlations. The method achieves 69.7% on Omni-MATH-Rule and 97.2% on GSM8K with a frozen 7B model, outperforming o1-mini on contamination-free benchmarks.
AINeutralarXiv – CS AI · May 97/10
🧠Researchers developed a causal analysis framework to audit bias in Large Language Models across seven global models, revealing that Western AI systems exhibit higher refusal rates for specific demographics while Eastern models show low intervention rates with regional sensitivities. The study demonstrates that traditional fairness metrics significantly overestimate demographic bias by conflating cultural context with model behavior, challenging current approaches to AI safety evaluation.
🧠 Llama
AIBullisharXiv – CS AI · Apr 147/10
🧠FACT-E is a new evaluation framework that uses controlled perturbations to assess the faithfulness of Chain-of-Thought reasoning in large language models, addressing the problem of models generating seemingly coherent explanations with invalid intermediate steps. By measuring both internal chain consistency and answer alignment, FACT-E enables more reliable detection of flawed reasoning and selection of trustworthy reasoning trajectories for in-context learning.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.
AINeutralarXiv – CS AI · Apr 147/10
🧠Researchers developed the first real-world benchmark for evaluating whether large language models can infer causal relationships from complex academic texts. The study reveals that LLMs struggle significantly with this task, with the best models achieving only 0.535 F1 scores, highlighting a critical gap in AI reasoning capabilities needed for AGI advancement.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers propose OrthoFormer, a new Transformer architecture that addresses causal learning limitations by embedding instrumental variable estimation directly into neural networks. The framework aims to distinguish between spurious correlations and true causal mechanisms, potentially improving AI model robustness and reliability under distribution shifts.
AINeutralarXiv – CS AI · Mar 47/104
🧠Researchers developed DICE-DML, a new framework that uses deepfake technology and machine learning to measure causal effects of visual attributes in digital advertising. The method addresses bias issues in standard approaches when analyzing how image elements like skin tone affect consumer engagement on social media platforms.
AINeutralarXiv – CS AI · Mar 37/104
🧠Researchers have developed a method to implement Pearl's causal inference framework (DO-calculus) on quantum circuits, mapping causal networks to quantum hardware through 'circuit surgery.' The approach was successfully demonstrated on IonQ's quantum processor using a healthcare model, showing agreement with classical baselines.
AINeutralarXiv – CS AI · 2d ago6/10
🧠A comprehensive study of Markov boundaries in tabular prediction reveals that while oracle boundaries significantly improve model performance, practical causal discovery methods fail to recover them cost-effectively. The research identifies fundamental misalignments between structural recovery optimization and predictive performance, suggesting that prediction-focused feature selection requires different approaches than theoretical assumptions propose.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce Query2Effect, a 72,000-question benchmark for predicting causal effect sizes from natural language queries using LLMs. A two-step framework combining structured representation generation with supervised encoding reduces prediction error by 27-71% compared to standard LLMs, demonstrating that separating semantic interpretation from numerical estimation improves both in-domain performance and out-of-domain generalization.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers present Nested Causal Thompson Sampling (NCTS), a machine learning framework for sequential decision-making where strategic choices causally influence subsequent tactical decisions across multiple timescales. The work introduces PAC-Bayesian risk bounds that enable off-policy certification of deployment policies from historical data alone, enabling safer handover from legacy systems to learned agents.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers present a new matrix completion approach for estimating heterogeneous treatment effects in panel data, achieving improved row-wise error bounds of Õ(√(1/n + n/m²)) without requiring knowledge of treatment propensities. The work establishes the first sharp row-wise perturbation bounds for low-rank approximation, advancing causal inference methodology.
AIBullisharXiv – CS AI · 3d ago6/10
🧠A new framework argues that AI in biomedicine is transitioning from predictive systems based on historical data to interventional intelligence that can model biological responses to novel therapies. The shift reflects a fundamental architectural limitation: traditional AI cannot reason about unseen interventions, making disease-level models that simulate outcomes under perturbation essential for clinical decision-making.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose a novel machine learning framework for estimating individual treatment effects from graph-structured data that explicitly models differentiated networked effects—how neighbors of varying importance and scales influence outcomes. The method uses partial attention mechanisms and message amplifiers to improve accuracy in observational studies across commerce and medicine.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose that human behavioral variability stems from dynamic latent states—weighted neural-psychological conditions that determine how individuals process decisions moment-to-moment. Drawing on 24 months of data from 200,000+ users, the framework suggests human outcomes are causally controllable through state-targeted interventions, with implications for AI personalization, digital health, and behavioral prediction systems.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce FedMPT, a novel federated learning method for multi-label recognition in vision-language models that addresses overfitting to spurious label correlations in decentralized settings. The approach uses causal modeling, LLM-driven condition analysis, and optimal transport mechanisms to improve model robustness when adapting to clients with heterogeneous private data.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce GS-Fuse, a machine learning framework that improves financial forecasting by intelligently combining event-driven text with price data. The system uses causal analysis to determine when news actually predicts market movements, addressing a key limitation in existing multimodal AI models that treat all data sources equally.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce a novel observational study design called confounder detection via treatment intent to address unobserved confounding in causal inference from non-randomized data. By querying expert decision-makers about treatment allocation through principled matching, the method aims to identify hidden variables affecting outcomes, with proof-of-concept demonstrated in ICU treatment analysis using clinical text notes and NLP.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers rigorously tested claims that Mamba state-space models can discover causal structure through prediction-only training, finding the method underperforms classical approaches like PCMCI and Granger causality. The apparent success in earlier experiments was largely attributable to sample-size confounds and non-standard intervention semantics rather than genuine architectural advantages.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Causal Parametric Drift Simulation, a framework using Structural Causal Models as digital twins to evaluate machine learning classifier robustness against concept drift in dynamic environments. The method preserves causal dependencies in tabular data and identifies vulnerabilities that conventional statistical tests miss, demonstrated on mental health datasets.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce SVAR-FM, a framework that uses physics-based simulators to discover causal relationships in time series data by treating simulation interventions as Pearl's do operator. The method recovers correct causal directions where observational methods fail due to confounding, with theoretical guarantees and empirical validation across multiple scientific domains.
AINeutralarXiv – CS AI · May 115/10
🧠Researchers present a solution for selecting cost-effective experiments to narrow uncertainty bounds on partially identifiable causal effects from observational data. They formalize this as an NP-hard optimization problem and develop pruning algorithms that eliminate 50-88% of candidate experiments without exhaustive computation, demonstrated on real epidemiological datasets.