#clinical-ai News & Analysis

131 articles tagged with #clinical-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

131 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

TTFT-Aware Graph Chain-of-Thought:Distance-Indexed Neural A* for Low-Hallucination Multi-Hop Medical Reasoning

Researchers present GraphRAG, a production-grade system for medical LLMs that reduces hallucinations by constraining answers to verifiable paths within a 700K-node medical knowledge graph. Using Pruned Landmark Labeling and AStarNet heuristics, the system improves clinical reasoning accuracy while reducing latency and hallucination rates in fertility assistant applications.

AIBullisharXiv – CS AI · Jun 237/10

🧠

From Handcrafted Features to Functional Edge Learning: Evolution of EEG Seizure Detection Frameworks

A comprehensive review examines how Kolmogorov-Arnold Networks (KANs) can overcome critical limitations in deep learning-based EEG seizure detection, offering improved interpretability, parameter efficiency, and performance under data scarcity constraints. The research positions KANs as a paradigm shift necessary for deploying transparent, clinically viable seizure detection systems in wearable and implantable neuromodulation devices.

AIBullisharXiv – CS AI · Jun 237/10

🧠

VISTA Architect: A graph database-oriented health AI system demonstrated in multidisciplinary tumor boards

Stanford Medicine researchers unveiled VISTA Architect, a graph database-powered AI system that integrates large language models with electronic health records to achieve 96.4% accuracy in clinical data extraction for tumor board preparation. The architecture precomputes patient histories into organized knowledge graphs, reducing processing time and latency compared to traditional RAG approaches while maintaining full data provenance.

AIBullisharXiv – CS AI · Jun 237/10

🧠

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs

ProMed introduces a reinforcement learning framework that transforms medical LLMs from reactive to proactive systems, using Shapley Information Gain to guide intelligent clinical questioning. The approach achieves 54.45% improvement over baseline reactive models and demonstrates strong generalization across medical benchmarks.

AIBullisharXiv – CS AI · Jun 237/10

🧠

Retrieval-Augmented Anatomical Guidance for Text-to-CT Generation

Researchers propose a retrieval-augmented approach for generating CT scans from radiology reports that combines semantic control with anatomical consistency by retrieving structurally similar clinical cases and using their annotations as guidance. The method improves image fidelity and clinical consistency compared to text-only baselines while enabling spatial controllability without requiring ground-truth annotations at inference time.

AIBullisharXiv – CS AI · Jun 237/10

🧠

Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering

Render-FM is a feedforward neural model that generates photorealistic 3D renderings of CT scans in 2.8 seconds, achieving a 500x speedup over traditional optimization methods. By directly predicting Gaussian Splatting parameters with anatomy-guided priors, the model enables real-time clinical visualization without per-scan training, making advanced volumetric rendering practical for hospital workflows.

AIBullisharXiv – CS AI · Jun 237/10

🧠

EnTrust: Modeling Inter-Modal Conflict for Trustworthy Multimodal Medical Image Analysis

EnTrust is a new framework for multimodal medical image analysis that treats disagreement between imaging modalities as a direct source of predictive uncertainty rather than averaging it away. The approach combines feature decomposition, diffusion-based segmentation, and calibrated uncertainty estimation to help clinicians understand not just where predictions are uncertain, but why, achieving state-of-the-art accuracy across multiple medical imaging domains.

AINeutralarXiv – CS AI · Jun 197/10

🧠

LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

Researchers demonstrate that Large Language Models lack genuine self-awareness regarding their knowledge limitations when applied to clinical tabular data, using cross-model attribution divergence to detect epistemic blind spots. LLM confidence scores remain constant regardless of actual accuracy, while a novel cross-model calibrator achieves reliable uncertainty quantification without model access or retraining.

AIBullisharXiv – CS AI · Jun 197/10

🧠

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

Researchers introduce cAPM, an AI-assisted system that uses continual learning and active learning to improve cardiac pace-mapping procedures for treating ventricular tachycardia. The system demonstrates 81% localization accuracy using only 4.5 pacing sites compared to 38% accuracy with 13.7 sites for existing methods, potentially reducing procedure time and patient risk.

AIBullisharXiv – CS AI · Jun 197/10

🧠

SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures

Researchers introduce SleepMaMi, a foundation model designed to analyze sleep patterns by capturing both hour-long sleep architecture and fine-grained biosignal features. Trained on over 20,000 polysomnography recordings, the model outperforms existing approaches and demonstrates superior generalizability for clinical sleep analysis applications.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

A large-scale study challenges the widespread assumption that fine-tuning language models with synthetic explanations improves clinical prediction performance. Researchers found that rationale-based supervised fine-tuning consistently degraded Alzheimer's disease prediction accuracy compared to label-only approaches, despite the rationales being medically accurate and human-verified.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning

Researchers introduce Dep-LLM, a training-free framework that diagnoses depression from clinical interviews by decomposing dialogue into structured themes and using large language models without fine-tuning. The system outperforms supervised approaches and commercial LLMs while requiring no additional training, addressing critical gaps in mental health AI deployment.

AIBullisharXiv – CS AI · Jun 97/10

🧠

A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology

PathPocket is a multimodal AI co-pilot system designed to assist pathologists by grounding diagnostic recommendations in verifiable medical evidence. Built on a comprehensive pathology knowledge base of 110,472 documents and 4.55 million entities, the system demonstrates significant improvements in diagnostic accuracy and pathologist confidence across 200,000+ real-world cases.

AIBullisharXiv – CS AI · Jun 97/10

🧠

CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation

CURE is a curriculum learning framework that improves medical vision-language models' ability to generate accurate radiology reports with better visual grounding. The method achieves significant gains in grounding accuracy (+0.35 IoU), report quality (+0.192 CXRFEScore), and hallucination reduction (18.6%) without requiring additional training data.

🏢 Hugging Face

AIBullisharXiv – CS AI · Jun 97/10

🧠

Reconstructing and forecasting disease trajectories of patients with Alzheimer's disease using routine data in resource-constrained settings

Researchers developed GNOVA, a machine learning framework combining GRU neural networks with Neural ODEs and variational autoencoders to predict Alzheimer's disease progression using only routine clinical data without expensive neuroimaging. The model successfully reconstructed patient cognitive trajectories and forecasted future cognitive states with high accuracy across 1,727 ADNI patients over 10 years, enabling deployment in resource-constrained healthcare settings.

AIBearisharXiv – CS AI · Jun 97/10

🧠

Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy

Researchers developed AI-MASLD, a stress-testing framework that reveals safety failures in clinical large language models hidden by benchmark accuracy metrics. Testing seven models across 240 clinical cases showed that while models performed well under baseline conditions, realistic narrative stress caused sharp performance divergence, with quantized models masking functional collapse and medical fine-tuning degrading logical stability and fairness.

AINeutralarXiv – CS AI · Jun 87/10

🧠

MMBU: A Massive Multi-modal Biomedical Understanding Benchmark to Probe the Perception Capabilities of Vision-Language Models

Researchers introduced MMBU, the largest biomedical vision-language benchmark covering 35 medical imaging modalities with structured metadata. Testing 15 open-weight and 2 frontier VLMs revealed that while medical adaptation helps some models, high reported accuracy on existing benchmarks masks significant deficiencies in visual perception and domain generalization.

AIBearisharXiv – CS AI · Jun 37/10

🧠

MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents

Researchers introduced MedCUA-Bench, a new benchmark for evaluating AI agents performing clinical computer tasks across 18 medical scenarios. The benchmark reveals significant performance gaps, with top closed-source models achieving only 54.2% success and open-source agents averaging just 2.5%, highlighting the unpreparedness of current AI systems for reliable medical software automation.

AIBullisharXiv – CS AI · Jun 27/10

🧠

A Monosemantic Attribution Framework for Stable Interpretability in Clinical Neuroscience Transformer-Based Language Models

Researchers have developed a monosemantic attribution framework to improve interpretability of Transformer-based language models in clinical applications, particularly for Alzheimer's disease diagnosis. The framework addresses instability in existing attribution methods by reducing inter-method variability and providing stable, explicit importance scores for model predictions.

AIBullisharXiv – CS AI · Jun 27/10

🧠

ELF: A Family of Encoder-Free ECG-Language Models

Researchers introduce ELF, a family of encoder-free ECG-Language Models that simplify the architecture of existing multimodal models for automated heart rhythm interpretation. Despite using simpler designs and training pipelines than predecessor systems, ELF matches or exceeds state-of-the-art performance, suggesting that architectural complexity in medical AI may be unnecessary.

AIBullisharXiv – CS AI · Jun 27/10

🧠

A Foundation Model for Wearable Movement Data in Mental Health Research

Researchers developed PAT (Pretrained Actigraphy Transformer), an open-source foundation model that analyzes wearable movement data to predict mental health outcomes including depression, sleep disorders, and medication use. Trained on data from over 21,000 U.S. participants, PAT significantly outperforms traditional deep learning models while providing interpretable insights into behavioral patterns relevant to clinical decision-making.

AIBearisharXiv – CS AI · Jun 27/10

🧠

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Researchers introduce ClinEnv, an interactive benchmark that evaluates large language models as attending physicians making real clinical decisions across multiple stages of patient care. The study reveals that even the strongest models achieve only 0.31 decision F1 scores, with significant gaps between diagnostic accuracy and clinical management quality, exposing how outcome-focused evaluations mask deficiencies in information-gathering processes.

AIBearisharXiv – CS AI · Jun 27/10

🧠

Cross-modal linkage risk in clinical vision-language models

Researchers discovered that vision-language models trained on paired chest X-rays and medical reports can re-link de-identified images to their original reports through embedding similarity, creating a privacy vulnerability. The team demonstrated this risk scales with model specialization and developed a differential privacy technique that reduces re-linkage by 62% while preserving diagnostic utility.

AINeutralarXiv – CS AI · Jun 17/10

🧠

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

Researchers introduce EHRBench, an automated benchmark containing nearly 1 million QA items derived from real patient electronic health records to evaluate large language models on clinical decision-making tasks. The framework combines LLM-based template generation with knowledge-base verification to assess model performance on diagnosis, treatment, and prognosis at scale while maintaining reliability.

AINeutralarXiv – CS AI · Jun 17/10

🧠

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

Researchers introduce the Causal Sensitivity Score (CSS), an interventional metric that evaluates clinical AI systems by mutating patient case variables to test whether models appropriately adjust recommendations. Testing reveals that six frontier LLMs rank nearly opposite to coverage-based benchmarks, with one model excelling at CSS while performing worst on traditional metrics, exposing a universal safety blind spot where all models fail on surgery-status changes.

Page 1 of 6Next →