AINeutralarXiv – CS AI · 7h ago7/10
🧠
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs
Researchers introduce EHRBench, an automated benchmark containing nearly 1 million QA items derived from real patient electronic health records to evaluate large language models on clinical decision-making tasks. The framework combines LLM-based template generation with knowledge-base verification to assess model performance on diagnosis, treatment, and prognosis at scale while maintaining reliability.