AIBullisharXiv – CS AI · Mar 36/106
🧠Researchers developed TARSE, a new AI system for clinical decision-making that retrieves relevant medical skills and experiences from curated libraries to improve reasoning accuracy. The system performs test-time adaptation to align language models with clinically valid logic, showing improvements over existing medical AI baselines in question-answering benchmarks.
AIBearisharXiv – CS AI · Mar 36/107
🧠Researchers created PanCanBench, a comprehensive benchmark evaluating 22 large language models on pancreatic cancer-related patient questions, revealing significant variations in clinical accuracy and high hallucination rates. The study found that even top-performing models like GPT-4o and Gemini-2.5 Pro had hallucination rates of 6%, while newer reasoning-optimized models didn't consistently improve factual accuracy.
AIBullisharXiv – CS AI · Mar 36/104
🧠Researchers introduce BoxMed-RL, a new AI framework that uses chain-of-thought reasoning and reinforcement learning to generate spatially verifiable radiology reports. The system mimics radiologist workflows by linking visual findings to precise anatomical locations, achieving 7% improvement over existing methods in key performance metrics.
$LINK
AINeutralarXiv – CS AI · Mar 26/1017
🧠Researchers conducted a systematic benchmark study on multimodal fusion between Electronic Health Records (EHR) and chest X-rays for clinical decision support, revealing when and how combining data modalities improves healthcare AI performance. The study found that multimodal fusion helps when data is complete but benefits degrade under realistic missing data scenarios, and released an open-source benchmarking toolkit for reproducible evaluation.
AIBullisharXiv – CS AI · Mar 27/1017
🧠Researchers developed BUSD-Agent, an AI framework for breast cancer screening that uses cascaded agents and experience-guided decision-making to reduce unnecessary biopsies. The system achieved a 22% reduction in biopsy referrals while improving diagnostic accuracy through retrieval-based learning from past cases.
AINeutralarXiv – CS AI · Feb 276/105
🧠Research analyzing physician disagreement in HealthBench medical AI evaluation dataset finds that 81.8% of disagreement variance is unexplained by observable features, with rubric identity accounting for only 15.8% of variance. The study reveals physicians agree on clearly good or bad AI outputs but disagree on borderline cases, suggesting structural limits to medical AI evaluation consistency.
AIBearisharXiv – CS AI · Feb 276/107
🧠Researchers developed ClinDet-Bench, a new benchmark that reveals large language models fail to properly identify when they have sufficient information to make clinical decisions. The study shows LLMs make both premature judgments and excessive abstentions in medical scenarios, highlighting safety concerns for AI deployment in healthcare settings.
AIBullisharXiv – CS AI · Feb 276/107
🧠Researchers developed a framework for analyzing AI diagnostic systems in clinical settings by preserving original AI inferences and comparing them with physician corrections. The study of 21 dermatological cases showed 71.4% exact agreement between AI and physicians, with 100% comprehensive concordance when using structured analysis methods.
AIBullisharXiv – CS AI · Feb 276/106
🧠Researchers developed a hybrid system combining machine learning ensembles with large language models for heart disease prediction, achieving 96.62% accuracy. The study found that traditional ML models (95.78% accuracy) outperformed standalone LLMs (78.9% accuracy), but combining both approaches yielded the best results for clinical decision-support tools.
AIBullisharXiv – CS AI · Feb 276/106
🧠ColoDiff is a new AI framework that uses diffusion models to generate high-quality colonoscopy videos for medical training and diagnosis. The system addresses data scarcity in medical imaging by creating synthetic videos with temporal consistency and precise clinical attribute control, achieving 90% faster generation through optimized sampling.
AIBullisharXiv – CS AI · Feb 276/105
🧠Researchers developed MedSegLatDiff, a new AI framework combining variational autoencoders with diffusion models for medical image segmentation. The system operates in compressed latent space to reduce computational costs while generating multiple plausible segmentation masks, achieving state-of-the-art performance on skin lesion, polyp, and lung nodule datasets.
AINeutralMIT News – AI · Jan 56/104
🧠MIT researchers have developed methods to test AI models used in clinical settings to prevent them from inadvertently revealing anonymized patient health data through memorization. This research addresses a critical privacy and security concern as healthcare AI systems become more prevalent.
AINeutralarXiv – CS AI · Mar 274/10
🧠Researchers propose a new framework for AI health agents that moves away from siloed, individual-user systems toward collaborative decision mediators that work within multi-stakeholder healthcare relationships. The study demonstrates through a pediatric case study that current AI tools fail to address collaboration gaps between patients, caregivers, and clinicians, proposing instead AI systems that preserve human authority while facilitating shared understanding.
AINeutralarXiv – CS AI · Mar 95/10
🧠This academic review examines the integration of foundation models and AI agents in computational pathology for medical applications. While AI shows promising performance in diagnosis and treatment prediction tasks, real-world clinical adoption remains limited due to economic, technical, and regulatory challenges.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers developed a framework to analyze how demographic attributes (age, sex, race) can be predicted from brain MRI scans by separating anatomical structure from acquisition-dependent contrast differences. The study found that demographic predictability primarily stems from anatomical variation rather than imaging artifacts, suggesting bias mitigation in medical AI must address both sources.
AINeutralarXiv – CS AI · Mar 44/102
🧠Researchers developed CASR-Net, a deep learning pipeline for automated coronary artery segmentation in X-ray angiograms that combines image preprocessing, UNet-based segmentation, and refinement stages. The system achieved superior performance with 61.43% IoU and 76.10% DSC on public datasets, potentially improving clinical diagnosis of coronary artery disease.
AINeutralarXiv – CS AI · Mar 35/108
🧠Researchers introduce a new framework for evaluating how well multimodal AI models reason about ECG signals by breaking down reasoning into perception (pattern identification) and deduction (logical application of medical knowledge). The framework uses automated code generation to verify temporal patterns and compares model logic against established clinical criteria databases.
AIBullisharXiv – CS AI · Mar 35/104
🧠Researchers developed a Noise Removal model to improve precision in clinical entity extraction using BERT-based Named Entity Recognition systems. The model uses advanced features like Probability Density Maps to identify weak vs strong predictions, reducing false positives by 50-90% in clinical NER applications.
AIBullishOpenAI News · Dec 144/106
🧠Summer Health has partnered with OpenAI to enhance pediatric healthcare by improving the accuracy of doctor's visit notes. This collaboration aims to reimagine how pediatric medical documentation is handled through AI technology.