y0news
AnalyticsDigestsSourcesRSSAICrypto
#physician-disagreement1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

Decomposing Physician Disagreement in HealthBench

Research analyzing physician disagreement in HealthBench medical AI evaluation dataset finds that 81.8% of disagreement variance is unexplained by observable features, with rubric identity accounting for only 15.8% of variance. The study reveals physicians agree on clearly good or bad AI outputs but disagree on borderline cases, suggesting structural limits to medical AI evaluation consistency.