CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning
Researchers introduce CLewR, a curriculum learning strategy that improves machine translation performance in large language models by reordering training data from easy to hard examples with periodic restarts. The approach demonstrates consistent improvements across multiple model families and preference optimization techniques, addressing a previously underexplored aspect of LLM training methodology.
The research addresses a fundamental but overlooked aspect of machine translation training: the sequence in which data samples are presented during the learning process. By implementing curriculum learning with restarts (CLewR), the team tackles catastrophic forgetting—a phenomenon where models lose proficiency on previously learned easy examples when exposed to harder training data. This is particularly relevant for preference optimization algorithms that have recently shown promise in improving multilingual translation capabilities.
The core innovation lies in the cyclical approach to curriculum learning. Rather than a single progression from easy to hard examples, CLewR iterates this sequence multiple times, effectively reinforcing foundational knowledge while building toward more complex patterns. This methodology builds on established curriculum learning principles but adapts them specifically for preference optimization contexts, where sample ordering has received minimal attention despite its potential impact.
The validation across Gemma2, Qwen2.5, and Llama3.1 models demonstrates broad applicability rather than optimization for a single architecture. This consistency suggests the approach captures genuine improvements in learning efficiency rather than model-specific artifacts. For developers working with large language models, this research provides a practical mechanism to enhance translation quality without requiring architectural changes or additional computational overhead.
Looking forward, the public code release enables immediate adoption across the community. The framework's generality suggests potential applications beyond machine translation—any preference optimization task might benefit from strategic sample ordering with restarts. Future work could explore whether similar patterns apply to other domains like instruction-following or alignment tasks.
- →CLewR curriculum learning with restarts consistently improves machine translation performance across multiple LLM families
- →Strategic data sample ordering mitigates catastrophic forgetting of easy examples during preference optimization training
- →The approach is model-agnostic and compatible with various state-of-the-art preference optimization algorithms
- →Public code availability enables immediate community adoption and validation
- →Results suggest sample ordering deserves greater attention as a general optimization strategy for LLM training