AIBearisharXiv – CS AI · Mar 176/10
🧠
Should LLMs, like, Generate How Users Talk? Building Dialect-Accurate Dialog[ue]s Beyond the American Default with MDial
Researchers introduced MDial, the first large-scale framework for generating multi-dialectal conversational data across nine English dialects, revealing that over 80% of English speakers don't use Standard American English. Evaluation of 17 LLMs showed even frontier models achieve under 70% accuracy in dialect identification, with particularly poor performance on non-American dialects.