Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages
Researchers introduce MIDI, a multilingual idiom dataset covering 18 languages across resource tiers, revealing that state-of-the-art NLP models struggle significantly with idiomatic expressions—particularly in low-resource languages and when interpreting literal meanings. The findings expose fundamental gaps in how current AI systems handle contextual language nuance across different linguistic communities.
The MIDI dataset addresses a critical blind spot in multilingual NLP research: idiomatic expression comprehension at scale. Idioms represent a distinctly human linguistic phenomenon where meaning diverges from literal word composition, requiring cultural knowledge and contextual reasoning. Prior benchmarks evaluated idioms in isolation, masking real-world performance degradation that occurs in natural conversational settings.
This research emerges from a broader pattern of AI capability disparities across language communities. While transformer-based models have achieved impressive results on English-centric benchmarks, their performance systematically declines as language resources diminish. The MIDI findings quantify this gap for a specific phenomenon, demonstrating that low-resource language speakers face compounded challenges: models trained on limited data struggle with figurative language, a core human communication tool.
For AI developers and companies building multilingual systems, this work signals that current architectures lack robust reasoning mechanisms for context-dependent meaning. The distinction between memorization and reasoning—uncovered through intervention analysis—matters because it reveals whether models genuinely understand language or merely pattern-match trained examples. Literal interpretation proving harder than figurative suggests models may rely on frequency-based shortcuts rather than compositional semantics.
Looking forward, this research will likely catalyze efforts to develop more sophisticated contextualization methods and larger-scale idiom datasets for underrepresented languages. Industry applications spanning machine translation, conversational AI, and content moderation depend on accurate idiom handling, making these limitations economically significant.
- →State-of-the-art NLP models show substantially degraded performance on idioms in low-resource languages compared to high-resource languages.
- →Literal idiom interpretations are harder for AI models than figurative ones, counter to intuitive assumptions about language difficulty.
- →The MIDI dataset provides the first large-scale multilingual idiom evaluation spanning conversational contexts, not isolated sentences.
- →Current models struggle to separate memorization from genuine reasoning when processing idiomatic expressions.
- →Conversational context improves model performance but fails to eliminate systematic performance disparities across language resource tiers.