y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#clinical-ai News & Analysis

59 articles tagged with #clinical-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

59 articles
AIBullisharXiv – CS AI · 3d ago7/10
🧠

Localizing Input Uncertainty Quantification for Large Language Models via Shapley Values

Researchers introduce ShaQ, a Shapley-value-based framework that identifies which specific parts of user input cause uncertainty in large language models, rather than just flagging overall uncertainty. The method achieves state-of-the-art ambiguity detection on multiple benchmarks and demonstrates practical value in high-stakes domains like clinical settings by enabling targeted input clarification.

AIBullisharXiv – CS AI · 4d ago7/10
🧠

MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation

MedVol-R1 introduces a reinforcement learning framework for volumetric reasoning segmentation in 3D medical imaging, decoupling evidence grounding from mask generation to improve interpretability and accuracy. The system uses an LVLM to identify key 2D evidence anchors before propagating them into coherent 3D segmentations, achieving state-of-the-art results on multiple medical imaging benchmarks without requiring expensive annotations.

AIBearisharXiv – CS AI · 4d ago7/10
🧠

GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration

GlobalDentBench introduces the first multinational dental benchmark with 8,978 expert-validated questions across 14 specialties, revealing that current LLMs face severe limitations in clinical reasoning with a 31.01% unsafe recommendation rate. The study demonstrates performance degrades sharply as reasoning complexity increases, with accuracy dropping from 81.34% on multiple-choice to just 22.34% on case-based questions, highlighting critical safety gaps before LLMs can be deployed in healthcare.

AIBullisharXiv – CS AI · May 127/10
🧠

CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

Researchers introduce CLR-voyance, a framework that treats inpatient clinical reasoning as a partially observable decision process with outcome-grounded rewards validated by clinicians. The resulting CLR-voyance-8B model outperforms GPT-5 and larger medical models on clinical benchmarks while maintaining generalist capabilities, and has been deployed in a hospital for six months.

🧠 GPT-5
AIBullisharXiv – CS AI · May 127/10
🧠

EpiGraph: A Knowledge Graph and Benchmark for Evidence-Intensive Reasoning in Epilepsy

Researchers have developed EpiGraph, a comprehensive knowledge graph containing 24,324 entities and 32,009 evidence-grounded triplets from 48,166 peer-reviewed papers to improve AI-driven epilepsy diagnosis and treatment. The accompanying EpiBench benchmark demonstrates that integrating structured clinical knowledge into large language models significantly enhances clinical reasoning, with improvements up to 41% in pharmacogenomic applications.

AIBullisharXiv – CS AI · May 97/10
🧠

A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization

Researchers introduced Hygieia, an AI agent system that integrates phenotypic, genetic, and clinical data to diagnose rare diseases and prioritize risk genes. Validated with clinical experts from Yale and Duke-NUS, the system demonstrated 12-60% diagnostic accuracy improvements over physicians and reduced clinician workload in real-world applications.

AIBullisharXiv – CS AI · May 47/10
🧠

Adoption and Use of LLMs at an Academic Medical Center

Researchers at an academic medical center developed ChatEHR, an LLM system integrated into electronic health records that enables both automated clinical tasks and interactive use across patient timelines. Over 1.5 years, the platform achieved adoption by 1,075 users conducting 23,000 sessions, generating an estimated $6M in first-year savings while maintaining vendor-agnostic governance.

AIBullisharXiv – CS AI · May 17/10
🧠

End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians

Researchers present a comprehensive governance framework for deployed clinical AI systems, demonstrated through Hyperscribe, an EHR-embedded audio transcription agent. The study shows that continuous monitoring, controlled experimentation, and multi-channel feedback mechanisms can improve system performance from 84% to 95% accuracy while maintaining operational efficiency and cost-effectiveness.

AIBullisharXiv – CS AI · Apr 157/10
🧠

Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning

Researchers propose Schema-Adaptive Tabular Representation Learning, which uses LLMs to convert structured clinical data into semantic embeddings that transfer across different electronic health record schemas without retraining. When combined with imaging data for dementia diagnosis, the method achieves state-of-the-art results and outperforms board-certified neurologists on retrospective diagnostic tasks.

AIBearisharXiv – CS AI · Apr 147/10
🧠

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

Researchers discovered that at least 27% of labels in MedCalc-Bench, a clinical benchmark partly created with LLM assistance, contain errors or are incomputable. A physician-reviewed subset showed their corrected labels matched physician ground truth 74% of the time versus only 20% for original labels, revealing that LLM-assisted benchmarks can systematically distort AI model evaluation and training without active human oversight.

AIBullisharXiv – CS AI · Mar 277/10
🧠

AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study

Researchers developed AD-CARE, an AI agent that uses large language models to diagnose Alzheimer's disease from incomplete medical data across multiple modalities. The system achieved 84.9% diagnostic accuracy across 10,303 cases and improved physician decision-making speed and accuracy in clinical studies.

AINeutralarXiv – CS AI · Mar 177/10
🧠

How Meta-research Can Pave the Road Towards Trustworthy AI In Healthcare: Catalogue of Ideas and Roadmap for Future Research

Researchers convened a February 2025 workshop to explore how meta-research methodologies can enhance Trustworthy AI (TAI) implementation in healthcare. The study identifies key challenges including robustness, reproducibility, clinical integration, and transparency gaps, proposing a roadmap for interdisciplinary collaboration between TAI and meta-research fields.

AIBullisharXiv – CS AI · Mar 117/10
🧠

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

Researchers developed EyExIn, a new AI framework that addresses critical gaps in large vision language models for medical diagnosis by anchoring them with domain-specific expert knowledge. The system uses dual-stream encoding and deep expert injection to improve accuracy in ophthalmic diagnosis, outperforming existing proprietary systems across four benchmarks.

AIBullisharXiv – CS AI · Mar 117/10
🧠

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

Google's AMIE conversational AI successfully completed a clinical feasibility study with 100 patients at an academic medical center, demonstrating 90% accuracy in including correct diagnoses and achieving high patient satisfaction. The AI showed comparable diagnostic quality to primary care physicians while requiring no safety interventions during real-world clinical interactions.

AIBearisharXiv – CS AI · Mar 57/10
🧠

SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

Researchers developed SycoEval-EM, a framework testing how large language models resist patient pressure for inappropriate medical care in emergency settings. Testing 20 LLMs across 1,875 encounters revealed acquiescence rates of 0-100%, with models more vulnerable to imaging requests than opioid prescriptions, highlighting the need for adversarial testing in clinical AI certification.

AIBullisharXiv – CS AI · Mar 47/102
🧠

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

Researchers have released MedXIAOHE, a new medical vision-language AI foundation model that achieves state-of-the-art performance across medical benchmarks and surpasses leading closed-source systems. The model incorporates advanced features like entity-aware pretraining, reinforcement learning for medical reasoning, and evidence-grounded report generation to improve reliability in clinical applications.

AIBullisharXiv – CS AI · Mar 47/103
🧠

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

Researchers developed GLEAN, a new AI verification framework that improves reliability of LLM-powered agents in high-stakes decisions like clinical diagnosis. The system uses expert guidelines and Bayesian logistic regression to better verify AI agent decisions, showing 12% improvement in accuracy and 50% better calibration in medical diagnosis tests.

AIBullishOpenAI News · Jul 227/103
🧠

Pioneering an AI clinical copilot with Penda Health

OpenAI and Penda Health have launched an AI clinical copilot that demonstrated a 16% reduction in diagnostic errors during real-world healthcare applications. This collaboration represents a significant advancement in practical AI implementation for medical diagnostics and patient care.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Do Clinical Models Change Treatment Decisions?

Researchers introduce ClinPivot, a benchmark testing whether clinical AI models adjust treatment decisions when patient contexts change. The study reveals that strong medical QA performance does not correlate with sound clinical decision-making, with leading models often failing to modify treatment choices appropriately when clinical constraints shift.

AIBullisharXiv – CS AI · 3d ago6/10
🧠

BuddyBench: A Privacy-Constrained Multi-Task Benchmark for Pediatric Social-Communication Personalization

BuddyBench introduces a privacy-protected multi-task benchmark dataset combining clinical assessments, learning trajectories, and treatment outcomes for pediatric social-communication research. The dataset integrates two cohorts (189 observational and 86 randomized controlled trial participants) to enable knowledge tracing, clinical prediction, and causal inference while maintaining pediatric data protection standards.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

Researchers introduce MeDial-Speech, a new 111+ hour speech dataset for training medical AI systems to conduct patient consultations across four health conditions. The study benchmarks state-of-the-art LLMs including Claude Sonnet 4, GPT-5 mini, and DeepSeek-V3, revealing that while Claude Sonnet 4 achieves 71-75% accuracy in medical dialogue tasks, all models exhibit significant overconfidence in their probabilistic predictions.

🏢 Hugging Face🧠 GPT-5🧠 Claude
AINeutralarXiv – CS AI · 4d ago6/10
🧠

Vital Trace: Protocol-Constrained Patient-State Reasoning for Longitudinal Clinical Trajectories

Researchers present Vital Trace, a protocol-constrained multi-agent AI framework designed to improve clinical risk prediction in intensive care units by tracking patient trajectories over extended periods. The system uses compact patient-state memory and structured reasoning agents rather than unbounded text histories, demonstrating better temporal consistency and interpretability on MIMIC-IV and eICU datasets.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma

Researchers have developed BioFact-MoE, a machine learning framework that uses specialized expert networks to separately analyze liver and tumor factors in hepatocellular carcinoma prognosis. The model achieves superior survival prediction accuracy (75%+ AUC at 12-18 months) while providing interpretable biological insights into treatment heterogeneity.

Page 1 of 3Next →