AIBullisharXiv – CS AI · 6h ago7/10
🧠
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
Researchers introduce DTop-p, a dynamic routing mechanism for Mixture-of-Experts (MoE) architectures that adaptively selects experts based on token difficulty while maintaining controlled computational costs. The approach outperforms traditional Top-k routing and fixed Top-p methods by using a Proportional-Integral controller to dynamically adjust probability thresholds, demonstrating consistent improvements across large language models and diffusion transformers.