y0news
AnalyticsDigestsSourcesRSSAICrypto
#healthbench2 articles
2 articles
AIBullishOpenAI News ยท May 127/106
๐Ÿง 

Introducing HealthBench

HealthBench is a new evaluation benchmark for AI in healthcare that assesses models in realistic clinical scenarios. Developed with input from over 250 physicians, it aims to establish standardized performance and safety metrics for healthcare AI models.

AINeutralarXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

Decomposing Physician Disagreement in HealthBench

Research analyzing physician disagreement in HealthBench medical AI evaluation dataset finds that 81.8% of disagreement variance is unexplained by observable features, with rubric identity accounting for only 15.8% of variance. The study reveals physicians agree on clearly good or bad AI outputs but disagree on borderline cases, suggesting structural limits to medical AI evaluation consistency.