AINeutralarXiv – CS AI · 6h ago6/10
🧠
NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise
Researchers introduce NoisyCausal, a benchmark for testing how well large language models handle causal reasoning when presented with noisy, incomplete, or misleading information. The study proposes a modular framework combining LLMs with explicit causal graph structures, demonstrating significant improvements over standard prompting approaches and better generalization across external benchmarks.