AINeutralarXiv – CS AI · 15h ago6/10
🧠
A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks
Researchers introduce MeDial-Speech, a new 111+ hour speech dataset for training medical AI systems to conduct patient consultations across four health conditions. The study benchmarks state-of-the-art LLMs including Claude Sonnet 4, GPT-5 mini, and DeepSeek-V3, revealing that while Claude Sonnet 4 achieves 71-75% accuracy in medical dialogue tasks, all models exhibit significant overconfidence in their probabilistic predictions.
🏢 Hugging Face🧠 GPT-5🧠 Claude