#medical-qa News & Analysis

4 articles tagged with #medical-qa. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AIBearisharXiv – CS AI · Jun 47/10

🧠

Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA

Researchers introduced DOSEBENCH, a benchmark of 81 OTC medication dosing scenarios, to evaluate how well large language models handle safety-critical medical decisions involving temporal reasoning and constraint adherence. Testing four LLMs revealed significant weaknesses in rolling-window calculations, ambiguity handling, and consistency—critical gaps for a use case where incorrect answers pose real health risks.

AINeutralarXiv – CS AI · May 286/10

🧠

EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA

Researchers propose EAPO, an entropy-driven adaptive method for training large reasoning models on open-ended question answering tasks. The approach dynamically adjusts the weighting of positive and negative samples during reinforcement learning training, demonstrating improved performance on medical QA datasets by balancing response diversity with stability.

AIBullisharXiv – CS AI · May 116/10

🧠

Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization

Researchers demonstrate that automated evaluation metrics can reliably assess AI-generated responses to patient hospitalization questions, matching human expert ratings across 2,800 responses from 28 AI systems. This approach addresses the scalability limitations of manual expert review while maintaining accuracy across three key dimensions: question answering, clinical evidence use, and medical knowledge application.

AIBullisharXiv – CS AI · Mar 166/10

🧠

DeCode: Decoupling Content and Delivery for Medical QA

Researchers introduce DeCode, a training-free framework that adapts large language models to provide better contextualized medical answers by decoupling content from delivery. The system significantly improves clinical question answering performance, boosting zero-shot results from 28.4% to 49.8% on medical benchmarks.

🏢 OpenAI