y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 6/10

Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models

arXiv – CS AI|Mir Tafseer Nayeem, Davood Rafiei|
🤖AI Summary

A new research study reveals that major large language models exhibit systematic bias toward American English over British English across training data, tokenization, and outputs. The research introduces DiAlign, a method for measuring dialectal alignment, and finds evidence of linguistic homogenization that could impact global AI equity.

Key Takeaways
  • Six major pretraining corpora show systematic skew toward American English over British English varieties.
  • LLM tokenizers impose higher segmentation costs on British English forms compared to American English.
  • Generative AI models consistently prefer American English in their outputs despite global English diversity.
  • The study introduces DiAlign, a training-free method for measuring dialectal bias in language models.
  • Researchers warn of linguistic homogenization and epistemic injustice in global AI deployment.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles