Researchers introduce Luar, a reinforcement learning framework that trains reasoning language models to selectively translate non-English inputs to English only when necessary for reliable reasoning. The approach achieves superior multilingual reasoning performance compared to standard baselines, particularly benefiting low-resource languages while avoiding unnecessary translation overhead.
The multilingual reasoning gap in large language models represents a persistent challenge in AI development. Current reasoning language models (RLMs) exhibit significant performance degradation on non-English inputs, primarily because their training data skews heavily toward English reasoning patterns. While machine translation offers a straightforward solution, indiscriminately translating every query introduces computational overhead and potential semantic loss. Luar addresses this tension by training models to make intelligent routing decisions—reasoning directly from original inputs when confident and invoking translation only when reliability is questionable.
This work builds on growing recognition that language understanding failures, rather than reasoning deficits, drive multilingual performance gaps. The reinforcement learning approach optimizes for a pragmatic goal: maximize reasoning accuracy while minimizing unnecessary translation calls. By learning language-understanding boundaries, the framework achieves dual benefits across diverse linguistic contexts.
For developers deploying multilingual reasoning systems, Luar suggests meaningful efficiency gains without sacrificing accuracy. The framework's ability to generalize learned translation behavior to unseen low-resource languages indicates the model captures principled decision-making rather than memorizing language-specific patterns. This has implications for building scalable systems serving global users, particularly in markets where computational resources are constrained. Organizations can potentially reduce inference costs while maintaining performance on critical reasoning tasks across languages.
Future development should explore whether similar boundary-aware frameworks apply to other modalities or specialized domains where selective augmentation provides efficiency-accuracy tradeoffs.
- →Luar enables language models to intelligently decide when translation is necessary rather than translating all non-English inputs indiscriminately.
- →The framework shows particularly strong improvements on low-resource languages where direct reasoning is least reliable.
- →Selective translation approach reduces computational overhead while maintaining or exceeding performance of always-translate baselines.
- →Model generalizes learned translation behavior to unseen languages, suggesting principled decision-making beyond memorization.
- →Framework addresses fundamental multilingual reasoning gap rooted in language-understanding failures rather than reasoning capability limitations.