FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning
FACT-E is a new evaluation framework that uses controlled perturbations to assess the faithfulness of Chain-of-Thought reasoning in large language models, addressing the problem of models generating seemingly coherent explanations with invalid intermediate steps. By measuring both internal chain consistency and answer alignment, FACT-E enables more reliable detection of flawed reasoning and selection of trustworthy reasoning trajectories for in-context learning.
Large language models have made significant strides in reasoning tasks through Chain-of-Thought prompting, yet a critical vulnerability persists: models frequently produce explanations that sound coherent while containing logically invalid intermediate steps. This fundamental disconnect between apparent credibility and actual faithfulness creates a major reliability problem for deploying LLMs in high-stakes domains where reasoning transparency matters. FACT-E addresses this by introducing a causality-inspired methodology that moves beyond simple coherence checks.
The framework's innovation lies in using controlled perturbations as instrumental signals to isolate genuine step-to-step dependencies from model biases. Rather than relying on the model to evaluate its own reasoning—a circular approach vulnerable to false confidence—FACT-E systematically probes whether intermediate steps actually drive the model's conclusions. By jointly optimizing for both intra-chain faithfulness and CoT-to-answer consistency, the framework ensures selected reasoning chains are internally sound and produce correct final answers.
Experimental validation across GSM8K, MATH, and CommonsenseQA demonstrates measurable improvements in trajectory selection and in-context learning exemplar quality. The framework's ability to reliably detect flawed reasoning under noisy conditions positions it as a practical tool for researchers and practitioners building trustworthy reasoning systems. This work aligns with broader industry efforts to improve LLM interpretability and reliability, addressing a key concern for enterprise adoption where understanding how models reach conclusions is essential for compliance and debugging.
- →FACT-E uses controlled perturbations to distinguish genuine reasoning dependencies from model biases in Chain-of-Thought explanations
- →The framework evaluates both internal chain faithfulness and answer consistency to select truly trustworthy reasoning trajectories
- →Experiments show improvements in reasoning-trajectory selection and in-context learning performance across multiple benchmarks
- →FACT-E demonstrates robust detection of flawed reasoning even under noisy conditions, enhancing LLM reliability assessment
- →This approach addresses a critical gap where models appear coherent but contain invalid intermediate logical steps