The Language-Energy Divide: Measuring Energy Costs of Multilingual LLM Inference
A comprehensive study reveals that multilingual LLM inference consumes vastly different amounts of energy across languages, with Pashto requiring 179 times more energy than English for identical requests. The disparity stems from complex script processing and token generation inefficiency in low-resource languages, compounded by a double penalty where high-energy languages also deliver lower accuracy.
This research exposes a critical inefficiency in how modern AI infrastructure handles linguistic diversity. The 8.3x variation in per-token energy costs and 179x difference in total energy consumption demonstrates that deploying identical models across languages creates systematic inequities in computational waste. The findings matter because energy consumption directly translates to operational costs, carbon emissions, and service availability in production systems.
The underlying mechanisms reveal a two-layer problem. Complex writing systems—like those used in Pashto, Arabic, or CJK languages—require more computational operations per token, driving higher energy per output. Simultaneously, these languages generate longer token sequences to express equivalent semantic content, multiplying the energy cost. This creates a compounding disadvantage for speakers of low-resource languages, directly contradicting the stated goal of democratizing AI access globally.
The performance-energy correlation presents a concerning market dynamic. Organizations deploying multilingual models face a trade-off where serving users in certain language markets requires disproportionate infrastructure investment while delivering inferior model accuracy. This incentivizes commercial deployment favoring high-resource languages, potentially widening digital divides.
The recommendations—treating energy as a first-class metric alongside accuracy and latency—could reshape how AI systems are benchmarked and optimized. Model cards and evaluation checklists that omit energy costs provide an incomplete picture for deployment decisions. Infrastructure providers and model developers must address these disparities through better tokenization schemes, script-specific optimizations, and deployment-side efficiency measures to build equitable multilingual AI systems.
- →Energy consumption for LLM inference varies 179x between languages, with English at 17.6 kJ versus Pashto at 3,147 kJ per request set
- →Complex or rare scripts increase per-token energy costs while low-resource languages generate more tokens, creating compounding efficiency penalties
- →Languages requiring highest energy also achieve lowest task accuracy, creating a double performance and efficiency disadvantage
- →Energy inequity persists across different models, hardware platforms, and task types, indicating systemic rather than isolated issues
- →Community adoption of energy metrics in model cards and evaluation checklists is essential for equitable multilingual AI deployment