y0news
AnalyticsDigestsSourcesRSSAICrypto
#mental-health-ai1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 7h ago7/10
๐Ÿง 

Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses

Researchers demonstrate that standard LLM-as-a-judge methods achieve only 52% accuracy in detecting hallucinations and omissions in mental health chatbots, failing in high-risk healthcare contexts. A hybrid framework combining human domain expertise with machine learning features achieves significantly higher performance (0.717-0.849 F1 scores), suggesting that transparent, interpretable approaches outperform black-box LLM evaluation in safety-critical applications.