AIBearisharXiv โ CS AI ยท 10h ago6/10
๐ง
Should LLMs, like, Generate How Users Talk? Building Dialect-Accurate Dialog[ue]s Beyond the American Default with MDial
Researchers introduced MDial, the first large-scale framework for generating multi-dialectal conversational data across nine English dialects, revealing that over 80% of English speakers don't use Standard American English. Evaluation of 17 LLMs showed even frontier models achieve under 70% accuracy in dialect identification, with particularly poor performance on non-American dialects.