y0news
#expert-specialization1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago2
๐Ÿง 

Expert Divergence Learning for MoE-based Language Models

Researchers introduce Expert Divergence Learning, a new pre-training strategy for Mixture-of-Experts language models that prevents expert homogenization by encouraging functional specialization. The method uses domain labels to maximize routing distribution differences between data domains, achieving better performance on 15 billion parameter models with minimal computational overhead.