Medical Reasoning with Large Language Models: A Survey and MR-Bench
Researchers present a comprehensive survey of medical reasoning in large language models, introducing MR-Bench, a clinical benchmark derived from real hospital data. The study reveals a significant performance gap between exam-style tasks and authentic clinical decision-making, highlighting that robust medical reasoning requires more than factual recall in safety-critical healthcare applications.