#evidence-grounding News & Analysis

8 articles tagged with #evidence-grounding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AINeutralarXiv – CS AI · Jun 237/10

🧠

GroundEval: A Deterministic Replacement for LLM-as-Judge in Stateful Agent Evaluation

GroundEval introduces a deterministic framework for evaluating AI agents by auditing their evidence retrieval and reasoning paths rather than relying on LLM judges. The tool detected a critical failure case where frontier LLM judges scored an agent response above 0.85, but the actual trace revealed the agent never retrieved the artifact it cited, yielding a GroundEval score of 0.000.

AIBullisharXiv – CS AI · May 277/10

🧠

MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation

MedVol-R1 introduces a reinforcement learning framework for volumetric reasoning segmentation in 3D medical imaging, decoupling evidence grounding from mask generation to improve interpretability and accuracy. The system uses an LVLM to identify key 2D evidence anchors before propagating them into coherent 3D segmentations, achieving state-of-the-art results on multiple medical imaging benchmarks without requiring expensive annotations.

AIBullisharXiv – CS AI · Apr 157/10

🧠

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

Researchers introduce DocSeeker, a multimodal AI system designed to improve long document understanding by implementing structured analysis, localization, and reasoning workflows. The breakthrough addresses critical limitations in existing large language models that struggle with lengthy documents due to high noise levels and weak training signals, achieving superior performance on both short and ultra-long documents.

AINeutralarXiv – CS AI · Jun 56/10

🧠

EGTR-Review: Efficient Evidence-Grounded Scientific Peer Review Generation via Multi-Agent Teacher Distillation

EGTR-Review presents a novel framework for automating scientific peer review using a multi-agent teacher model that distills its reasoning into a lightweight student model, achieving superior performance with significantly lower computational costs while maintaining evidence traceability and factual grounding.

AINeutralarXiv – CS AI · May 296/10

🧠

Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment

Researchers propose an adaptive interview framework to improve how large language models simulate individual decision-making by gathering persona-relevant information through structured dialogue. The study finds that richer contextual information alone doesn't guarantee better accuracy; instead, LLMs only improve predictions (45.5% vs. 39.3%) when they actively ground decisions in user-specific evidence extracted during follow-up questions.

AINeutralarXiv – CS AI · May 296/10

🧠

From Rubrics to Reliable Scores: Evidence-Grounded Text Evaluation with LLM Judges

Researchers introduce Rulers, a three-stage framework that improves how large language models evaluate text against human rubrics by converting qualitative criteria into locked specifications, structured checklists with evidence grounding, and calibrated score interpretation. The approach addresses three key failure modes in LLM-based scoring and demonstrates stronger alignment with human scoring across multiple benchmarks in essay evaluation, summarization, and writing assessment.

AIBullisharXiv – CS AI · Mar 37/108

🧠

CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

Researchers introduce CARE, an evidence-grounded agentic framework for medical AI that improves clinical accountability by decomposing tasks into specialized modules rather than using black-box models. The system achieves 10.9% better accuracy than state-of-the-art models by incorporating explicit visual evidence and coordinated reasoning that mimics clinical workflows.

AIBullisharXiv – CS AI · Mar 36/108

🧠

DeepXiv-SDK: An Agentic Data Interface for Scientific Papers

DeepXiv-SDK introduces a new agentic data interface for scientific papers that enables AI research agents to access and process academic literature more efficiently. The SDK provides structured, budget-aware views of papers and supports progressive access patterns, currently deployed at arXiv scale with free API access.