AIBullisharXiv – CS AI · 18h ago7/10
🧠
Post-Trained MoE Can Skip Half Experts via Self-Distillation
Researchers introduced ZEDA, a framework that converts fully-trained Mixture-of-Experts language models into dynamic variants capable of skipping unnecessary experts, reducing computational requirements by over 50% with minimal accuracy loss. The method uses self-distillation to adapt post-trained models without retraining from scratch, achieving ~1.20x end-to-end inference speedup on major language models.