←Back to feed
🧠 AI🔴 BearishImportance 6/10
Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models
🤖AI Summary
A new research study reveals that major large language models exhibit systematic bias toward American English over British English across training data, tokenization, and outputs. The research introduces DiAlign, a method for measuring dialectal alignment, and finds evidence of linguistic homogenization that could impact global AI equity.
Key Takeaways
- →Six major pretraining corpora show systematic skew toward American English over British English varieties.
- →LLM tokenizers impose higher segmentation costs on British English forms compared to American English.
- →Generative AI models consistently prefer American English in their outputs despite global English diversity.
- →The study introduces DiAlign, a training-free method for measuring dialectal bias in language models.
- →Researchers warn of linguistic homogenization and epistemic injustice in global AI deployment.
#llm-bias#english-variants#ai-ethics#linguistic-equity#model-training#dialectal-bias#postcolonial-ai#foundation-models
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Related Articles