Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking
Researchers introduce PAVE, a diagnostic framework for evaluating how large language models arbitrate between their parametric knowledge and retrieved evidence in RAG-based fact-checking systems. Testing across seven LLMs reveals inconsistent and model-dependent behavior when prior knowledge conflicts with retrieved context, prompting the development of a lightweight test-time correction method to improve factual reliability.
This research addresses a critical vulnerability in retrieval-augmented generation (RAG) systems that increasingly power fact-checking and knowledge verification applications. LLMs possess ingrained parametric knowledge from training data that can conflict with freshly retrieved evidence, yet existing evaluation frameworks ignore this tension. The PAVE testbed isolates this problem by creating scenarios where an LLM's pre-evidence confidence and correctness are known, then measuring whether the model sticks to correct priors despite misleading evidence or corrects false priors when accurate evidence arrives.
The findings are sobering: different LLM families show dramatically different arbitration patterns, with no universal approach to balancing parametric versus contextual signals. This inconsistency reflects deeper architectural and training differences across models. For developers deploying RAG systems in high-stakes domains like legal review, medical fact-checking, or financial analysis, model selection directly impacts reliability in ways that standard benchmarks miss.
The proposed JSD-based (Jensen-Shannon Divergence) arbitration method offers a practical path forward without requiring model retraining. By adjusting how models weight their internal signals at inference time, this approach improves factual accuracy across diverse architectures. The research establishes that verifier reliability cannot be assumed from general-purpose benchmarks; it demands epistemic-state-specific evaluation.
Looking ahead, RAG-based systems will likely require certification frameworks that explicitly test prior-context arbitration before deployment. As LLMs become fact-checking infrastructure for autonomous systems and critical applications, understanding these failure modes becomes essential for responsible AI deployment.
- βLLM verifiers show unreliable and model-dependent behavior when parametric knowledge conflicts with retrieved evidence in RAG systems.
- βPAVE diagnostic framework stratifies models into four epistemic states to measure prior-context arbitration without requiring model retraining.
- βDifferent LLM families exhibit dramatically different tendencies to defer to retrieved evidence over internal parametric knowledge.
- βA lightweight test-time arbitration method using Jensen-Shannon Divergence improves factual reliability across diverse LLM architectures.
- βVerifier selection significantly impacts fact-checking reliability in real-world applications, demanding epistemic-state-specific evaluation before deployment.