y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#medical-ai News & Analysis

166 articles tagged with #medical-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

166 articles
AIBullishOpenAI News · Jan 77/105
🧠

Introducing ChatGPT Health

OpenAI has launched ChatGPT Health, a specialized version of its AI assistant designed to securely integrate with health data and applications. The platform emphasizes privacy protections and incorporates physician-informed design principles for healthcare applications.

AIBullishGoogle DeepMind Blog · Oct 237/103
🧠

How a Gemma model helped discover a new potential cancer therapy pathway

Google has launched a new 27 billion parameter foundation model for single-cell analysis, built on the Gemma family of open models. The model has reportedly helped discover a new potential cancer therapy pathway, demonstrating practical medical applications of AI technology.

AIBullishGoogle Research Blog · Jul 97/108
🧠

MedGemma: Our most capable open models for health AI development

Google has released MedGemma, described as their most capable open-source models specifically designed for health AI development. This represents a significant advancement in making specialized medical AI tools accessible to developers and researchers in the healthcare sector.

AIBullishOpenAI News · May 127/106
🧠

Introducing HealthBench

HealthBench is a new evaluation benchmark for AI in healthcare that assesses models in realistic clinical scenarios. Developed with input from over 250 physicians, it aims to establish standardized performance and safety metrics for healthcare AI models.

AIBullishWall Street Journal – Tech · Jan 277/103
🧠

Reid Hoffman Raises $24.6 Million for AI Cancer-Research Startup

LinkedIn co-founder Reid Hoffman has raised $24.6 million to launch Manas AI, a startup focused on AI-driven cancer research. The venture partners with Siddhartha Mukherjee, renowned oncologist and author of 'The Emperor of All Maladies,' combining Hoffman's tech expertise with medical authority.

AIBullisharXiv – CS AI · 1d ago6/10
🧠

INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Researchers propose INFORM-CT, an AI framework combining large language models and vision-language models to automate detection and reporting of incidental findings in abdominal CT scans. The system uses a planner-executor approach that outperforms traditional manual inspection and existing pure vision-based models in accuracy and efficiency.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

SymptomWise introduces a deterministic reasoning framework that separates language understanding from diagnostic inference in AI-driven medical systems, combining expert-curated knowledge with constrained LLM use to improve reliability and reduce hallucinations. The system achieved 88% accuracy in placing correct diagnoses in top-five differentials on challenging pediatric neurology cases, demonstrating how structured approaches can enhance AI safety in critical domains.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

Front-End Ethics for Sensor-Fused Health Conversational Agents: An Ethical Design Space for Biometrics

Researchers propose an ethical framework for sensor-fused health AI agents that combine biometric data with large language models. The paper identifies critical risks at the user-facing layer where sensor data is translated into health guidance, arguing that the perceived objectivity of biometrics can mask AI errors and turn them into harmful medical directives.

AIBearisharXiv – CS AI · 6d ago6/10
🧠

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

Researchers introduce MedDialBench, a comprehensive benchmark testing how large language models maintain diagnostic accuracy when patients exhibit adversarial behaviors across five dimensions. The study reveals that fabricating symptoms causes 1.7-3.4x larger accuracy drops than withholding information, with worst-case performance degradation ranging from 38.8 to 54.1 percentage points across tested models.

AIBullisharXiv – CS AI · Apr 76/10
🧠

VERT: Reliable LLM Judges for Radiology Report Evaluation

Researchers introduced VERT, a new LLM-based metric for evaluating radiology reports that shows up to 11.7% better correlation with radiologist judgments compared to existing methods. The study demonstrates that fine-tuned smaller models can achieve significant performance gains while reducing inference time by up to 37.2 times.

AIBullisharXiv – CS AI · Mar 276/10
🧠

Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models

Photon is a new framework that efficiently processes 3D medical imaging for AI visual question answering by using variable-length token sequences and adaptive compression. The system reduces computational costs while maintaining accuracy through instruction-conditioned token scheduling and custom gradient propagation techniques.

AIBullisharXiv – CS AI · Mar 276/10
🧠

DeepFAN, a transformer-based deep learning model for human-artificial intelligence collaborative assessment of incidental pulmonary nodules in CT scans: a multi-reader, multi-case trial

DeepFAN, a transformer-based AI model, achieved 93.9% diagnostic accuracy for lung nodule classification and significantly improved junior radiologists' performance by 10.9% in clinical trials. The model was trained on over 10,000 pathology-confirmed nodules and validated across 400 cases at three medical institutions.

🏢 Meta
AIBullisharXiv – CS AI · Mar 276/10
🧠

Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

Researchers successfully fine-tuned LLaMA 3.1-8B for medical transcription in Finnish, a low-resource language, achieving strong semantic similarity despite low n-gram overlap. The study used simulated clinical conversations from students and demonstrates the feasibility of privacy-oriented domain-specific language models for clinical documentation in underrepresented languages.

AINeutralarXiv – CS AI · Mar 276/10
🧠

NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders

Researchers benchmarked 20 multimodal AI models on neuroimaging tasks using MRI and CT scans, finding that while technical attributes like imaging modality are nearly solved, diagnostic reasoning remains challenging. Gemini-2.5-Pro and GPT-5-Chat showed strongest diagnostic performance, while open-source MedGemma-1.5-4B demonstrated promising results under few-shot prompting.

🏢 Meta🧠 GPT-5🧠 Gemini
AIBullisharXiv – CS AI · Mar 266/10
🧠

Learning To Guide Human Decision Makers With Vision-Language Models

Researchers introduce Learning to Guide (LTG), a new AI framework where machines provide interpretable guidance to human decision-makers rather than making automated decisions. The SLOG approach transforms vision-language models into guidance generators using human feedback, showing promise in medical diagnosis applications.

AIBullisharXiv – CS AI · Mar 266/10
🧠

MedAidDialog: A Multilingual Multi-Turn Medical Dialogue Dataset for Accessible Healthcare

Researchers have introduced MedAidDialog, a multilingual medical dialogue dataset covering seven languages, and developed MedAidLM, a conversational AI model for preliminary medical consultations. The system uses parameter-efficient fine-tuning on small language models to enable deployment without high-end computational infrastructure while incorporating patient context for personalized consultations.

AIBullisharXiv – CS AI · Mar 176/10
🧠

OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence

Researchers introduce OpenHospital, a new interactive arena designed to develop and benchmark Large Language Model-based Collective Intelligence through physician-patient agent interactions. The platform uses a data-in-agent-self paradigm to rapidly enhance AI agent capabilities while providing evaluation metrics for medical proficiency and system efficiency.

AIBullisharXiv – CS AI · Mar 176/10
🧠

PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

Researchers developed PREBA, a retrieval-augmented framework that uses PCA-weighted retrieval and Bayesian averaging to improve surgical duration prediction accuracy by up to 40% using large language models. The system grounds LLM predictions in institution-specific clinical data without requiring computationally intensive training, achieving performance competitive with supervised machine learning methods.

AINeutralarXiv – CS AI · Mar 176/10
🧠

QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models

Researchers introduced QuarkMedBench, a new benchmark for evaluating large language models on real-world medical queries using over 20,000 queries across clinical care scenarios. The benchmark addresses limitations of current medical AI evaluations that rely on multiple-choice questions by using an automated scoring framework that achieves 91.8% concordance with clinical expert assessments.

AIBullisharXiv – CS AI · Mar 166/10
🧠

DeCode: Decoupling Content and Delivery for Medical QA

Researchers introduce DeCode, a training-free framework that adapts large language models to provide better contextualized medical answers by decoupling content from delivery. The system significantly improves clinical question answering performance, boosting zero-shot results from 28.4% to 49.8% on medical benchmarks.

🏢 OpenAI
AIBullisharXiv – CS AI · Mar 166/10
🧠

UniPrompt-CL: Sustainable Continual Learning in Medical AI with Unified Prompt Pools

Researchers developed UniPrompt-CL, a new continual learning method specifically designed for medical AI that addresses the limitations of existing approaches when applied to medical data. The method uses a unified prompt pool design and regularization to achieve better performance while reducing computational costs, improving accuracy by 1-3 percentage points in domain-incremental learning settings.

AIBullisharXiv – CS AI · Mar 96/10
🧠

Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education

Researchers developed an AI system that can detect fetal orofacial clefts in ultrasound images with over 93% sensitivity and 95% specificity, matching senior radiologist performance. The system was trained on 45,139 ultrasound images from 9,215 fetuses across 22 hospitals and can also improve junior radiologist diagnostic accuracy by 6%.

🏢 Microsoft
AIBullisharXiv – CS AI · Mar 96/10
🧠

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

Researchers introduced RAMoEA-QA, a new AI system that uses hierarchical specialization to answer questions about respiratory audio recordings from mobile devices. The system employs a two-stage routing approach with Audio Mixture-of-Experts and Language Mixture-of-Adapters to handle diverse recording conditions and query types, achieving 0.72 test accuracy compared to 0.61-0.67 for existing baselines.

← PrevPage 3 of 7Next →