y0news
← Feed
←Back to feed
🧠 AIβšͺ NeutralImportance 7/10

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

arXiv – CS AI|Yubo Li, Rema Padman, Ramayya Krishnan|
πŸ€–AI Summary

Researchers identify source-dependence as a critical failure mode in retrieval-augmented generation (RAG) systems, where multi-source medical AI systems provide different answers to identical questions based on which institutional source is retrieved. The study introduces TransplantQA, HERO-QA, and evaluation frameworks to audit this phenomenon, revealing that source disagreement is far more prevalent than previously measured.

Analysis

RAG systems deployed in institutional settings often aggregate information from multiple sources, yet existing evaluation paradigms assume a single correct answer exists. This research exposes a fundamental blind spot: when institutional sources legitimately disagree on medical guidance, current NLP metrics cannot diagnose or measure the system's handling of this disagreement. The study demonstrates this through transplant patient education, where institutional handbooks contain genuine conflicts in recommendations.

The technical contribution centers on shifting evaluation from answer-level correctness to inter-source relationship analysis. HERO-QA implements hierarchical retrieval that grounds answers in specific sources, while a structured-output judge applies a validated 5-label taxonomy to classify source relationships. At scale, this approach uncovers substantially more disagreement than prior estimates suggested, indicating the problem was historically underestimated rather than overstated.

This work carries significant implications for deployed NLP systems in regulated domains. Medical AI systems must not only provide accurate information but also acknowledge source conflicts and uncertainty. The framework's domain-agnostic design transfers to legal and educational contexts, suggesting source-dependence is a systemic issue across knowledge work applications. For organizations deploying RAG systems in high-stakes environments, this research necessitates rethinking evaluation protocols and system transparency.

Looking forward, the field must develop standardized approaches to auditing source-dependence in production systems. This includes determining when systems should acknowledge disagreement versus synthesizing consensus, and how to communicate uncertainty to end users. The work establishes source-dependence as a legitimate axis of NLP evaluation rather than an edge case.

Key Takeaways
  • β†’Source-dependence in multi-source RAG systems represents a critical evaluation gap not captured by single-answer correctness metrics.
  • β†’Institutional sources in regulated domains like medicine frequently contain genuine disagreements requiring explicit auditing mechanisms.
  • β†’HERO-QA and structured taxonomy approaches enable systematic measurement of inter-source relationships at scale.
  • β†’Better retrieval methods reveal substantially higher prevalence of source disagreement than prior estimates indicated.
  • β†’Source-dependence auditing is a domain-agnostic responsibility for all deployed multi-source NLP systems in knowledge work.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles