AIBullisharXiv โ CS AI ยท 6h ago2
๐ง
Expert Divergence Learning for MoE-based Language Models
Researchers introduce Expert Divergence Learning, a new pre-training strategy for Mixture-of-Experts language models that prevents expert homogenization by encouraging functional specialization. The method uses domain labels to maximize routing distribution differences between data domains, achieving better performance on 15 billion parameter models with minimal computational overhead.