38 articles tagged with #clinical-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 1d ago7/10
๐ง Researchers propose Schema-Adaptive Tabular Representation Learning, which uses LLMs to convert structured clinical data into semantic embeddings that transfer across different electronic health record schemas without retraining. When combined with imaging data for dementia diagnosis, the method achieves state-of-the-art results and outperforms board-certified neurologists on retrospective diagnostic tasks.
AIBearisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers discovered that at least 27% of labels in MedCalc-Bench, a clinical benchmark partly created with LLM assistance, contain errors or are incomputable. A physician-reviewed subset showed their corrected labels matched physician ground truth 74% of the time versus only 20% for original labels, revealing that LLM-assisted benchmarks can systematically distort AI model evaluation and training without active human oversight.
AIBullisharXiv โ CS AI ยท Mar 277/10
๐ง Researchers developed AD-CARE, an AI agent that uses large language models to diagnose Alzheimer's disease from incomplete medical data across multiple modalities. The system achieved 84.9% diagnostic accuracy across 10,303 cases and improved physician decision-making speed and accuracy in clinical studies.
AINeutralarXiv โ CS AI ยท Mar 177/10
๐ง Researchers convened a February 2025 workshop to explore how meta-research methodologies can enhance Trustworthy AI (TAI) implementation in healthcare. The study identifies key challenges including robustness, reproducibility, clinical integration, and transparency gaps, proposing a roadmap for interdisciplinary collaboration between TAI and meta-research fields.
AIBullisharXiv โ CS AI ยท Mar 117/10
๐ง Google's AMIE conversational AI successfully completed a clinical feasibility study with 100 patients at an academic medical center, demonstrating 90% accuracy in including correct diagnoses and achieving high patient satisfaction. The AI showed comparable diagnostic quality to primary care physicians while requiring no safety interventions during real-world clinical interactions.
AIBullisharXiv โ CS AI ยท Mar 117/10
๐ง Researchers developed EyExIn, a new AI framework that addresses critical gaps in large vision language models for medical diagnosis by anchoring them with domain-specific expert knowledge. The system uses dual-stream encoding and deep expert injection to improve accuracy in ophthalmic diagnosis, outperforming existing proprietary systems across four benchmarks.
AIBearisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed SycoEval-EM, a framework testing how large language models resist patient pressure for inappropriate medical care in emergency settings. Testing 20 LLMs across 1,875 encounters revealed acquiescence rates of 0-100%, with models more vulnerable to imaging requests than opioid prescriptions, highlighting the need for adversarial testing in clinical AI certification.
AINeutralarXiv โ CS AI ยท Mar 57/10
๐ง Researchers propose RAG-X, a diagnostic framework for evaluating retrieval-augmented generation systems in medical AI applications. The study reveals an 'Accuracy Fallacy' showing a 14% gap between perceived system success and actual evidence-based grounding in medical question-answering systems.
AIBullisharXiv โ CS AI ยท Mar 47/102
๐ง Researchers have released MedXIAOHE, a new medical vision-language AI foundation model that achieves state-of-the-art performance across medical benchmarks and surpasses leading closed-source systems. The model incorporates advanced features like entity-aware pretraining, reinforcement learning for medical reasoning, and evidence-grounded report generation to improve reliability in clinical applications.
AIBullisharXiv โ CS AI ยท Mar 47/103
๐ง Researchers developed GLEAN, a new AI verification framework that improves reliability of LLM-powered agents in high-stakes decisions like clinical diagnosis. The system uses expert guidelines and Bayesian logistic regression to better verify AI agent decisions, showing 12% improvement in accuracy and 50% better calibration in medical diagnosis tests.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers developed SpiroLLM, the first multimodal large language model capable of understanding spirogram time series data for COPD diagnosis. Using data from 234,028 UK Biobank individuals, the model achieved 0.8977 diagnostic AUROC and maintained 100% valid response rate even with missing data, far outperforming text-only models.
AIBullishOpenAI News ยท Jul 227/103
๐ง OpenAI and Penda Health have launched an AI clinical copilot that demonstrated a 16% reduction in diagnostic errors during real-world healthcare applications. This collaboration represents a significant advancement in practical AI implementation for medical diagnostics and patient care.
AINeutralarXiv โ CS AI ยท 1d ago6/10
๐ง Researchers propose a multi-layer AI agent framework designed to support longitudinal health tasks over extended periods, addressing critical gaps in current implementations around user intent, accountability, and sustained goal alignment. The framework emphasizes adaptation, coherence, continuity, and agency across repeated interactions, offering guidance for developing safer, more personalized health AI systems that move beyond isolated interventions.
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers have developed a comprehensive evaluation framework for Large Language Models applied to outpatient referral systems in healthcare, revealing that LLMs offer limited advantages over simpler BERT-like models in static referral tasks but demonstrate potential in interactive dialogue scenarios. The study addresses the absence of standardized evaluation criteria for assessing LLM effectiveness in dynamic healthcare settings.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce Reason2Decide, a two-stage training framework that improves clinical decision support systems by aligning AI explanations with predictions. The system achieves better performance than larger foundation models while using 40x smaller models, making clinical AI more accessible for resource-constrained deployments.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce EviAgent, a new AI system for automated radiology report generation that provides transparent, evidence-driven analysis. The system addresses key limitations of current medical AI models by offering traceable decision-making and integrating external domain knowledge, outperforming existing specialized medical models in testing.
AIBullisharXiv โ CS AI ยท Mar 166/10
๐ง Researchers introduce DeCode, a training-free framework that adapts large language models to provide better contextualized medical answers by decoupling content from delivery. The system significantly improves clinical question answering performance, boosting zero-shot results from 28.4% to 49.8% on medical benchmarks.
๐ข OpenAI
AINeutralarXiv โ CS AI ยท Mar 36/107
๐ง A research study evaluated how four major large language models (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) respond to patient preferences in clinical decision-making scenarios. While all models acknowledged patient values, they showed modest actual recommendation shifting with value sensitivity indices ranging from 0.13 to 0.27, revealing gaps in how AI systems incorporate patient preferences into medical recommendations.
AIBullisharXiv โ CS AI ยท Mar 36/1010
๐ง Researchers propose ClinCoT, a new framework for medical AI that improves Visual Language Models by grounding reasoning in specific visual regions rather than just text. The approach reduces factual hallucinations in medical AI systems by using visual chain-of-thought reasoning with clinically relevant image regions.
AIBullisharXiv โ CS AI ยท Mar 36/106
๐ง Researchers developed TARSE, a new AI system for clinical decision-making that retrieves relevant medical skills and experiences from curated libraries to improve reasoning accuracy. The system performs test-time adaptation to align language models with clinically valid logic, showing improvements over existing medical AI baselines in question-answering benchmarks.
AIBearisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers created PanCanBench, a comprehensive benchmark evaluating 22 large language models on pancreatic cancer-related patient questions, revealing significant variations in clinical accuracy and high hallucination rates. The study found that even top-performing models like GPT-4o and Gemini-2.5 Pro had hallucination rates of 6%, while newer reasoning-optimized models didn't consistently improve factual accuracy.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduce BoxMed-RL, a new AI framework that uses chain-of-thought reasoning and reinforcement learning to generate spatially verifiable radiology reports. The system mimics radiologist workflows by linking visual findings to precise anatomical locations, achieving 7% improvement over existing methods in key performance metrics.
$LINK
AINeutralarXiv โ CS AI ยท Mar 26/1017
๐ง Researchers conducted a systematic benchmark study on multimodal fusion between Electronic Health Records (EHR) and chest X-rays for clinical decision support, revealing when and how combining data modalities improves healthcare AI performance. The study found that multimodal fusion helps when data is complete but benefits degrade under realistic missing data scenarios, and released an open-source benchmarking toolkit for reproducible evaluation.
AIBullisharXiv โ CS AI ยท Mar 27/1017
๐ง Researchers developed BUSD-Agent, an AI framework for breast cancer screening that uses cascaded agents and experience-guided decision-making to reduce unnecessary biopsies. The system achieved a 22% reduction in biopsy referrals while improving diagnostic accuracy through retrieval-based learning from past cases.
AIBullisharXiv โ CS AI ยท Feb 276/107
๐ง Researchers developed a framework for analyzing AI diagnostic systems in clinical settings by preserving original AI inferences and comparing them with physician corrections. The study of 21 dermatological cases showed 71.4% exact agreement between AI and physicians, with 100% comprehensive concordance when using structured analysis methods.