AIBearisharXiv โ CS AI ยท 5h ago1
๐ง
TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health
Researchers have developed TrustMH-Bench, a comprehensive framework to evaluate the trustworthiness of Large Language Models (LLMs) in mental health applications. Testing revealed that both general-purpose and specialized mental health LLMs, including advanced models like GPT-5.1, significantly underperform across critical trustworthiness dimensions in mental health scenarios.