AIBearisharXiv – CS AI · 8h ago7/10
🧠
Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA
Researchers introduced DOSEBENCH, a benchmark of 81 OTC medication dosing scenarios, to evaluate how well large language models handle safety-critical medical decisions involving temporal reasoning and constraint adherence. Testing four LLMs revealed significant weaknesses in rolling-window calculations, ambiguity handling, and consistency—critical gaps for a use case where incorrect answers pose real health risks.