#moe-routing News & Analysis

5 articles tagged with #moe-routing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts

Researchers introduce ProbMoE, a probabilistic routing framework that solves a fundamental challenge in training Mixture-of-Experts models by replacing discrete, non-differentiable top-k routing with a differentiable probabilistic approach. The method achieves comparable or improved performance while enabling dynamic expert allocation and better expert utilization across various benchmarks.

AIBullisharXiv – CS AI · Jun 17/10

🧠

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training

Researchers introduce DTop-p, a dynamic routing mechanism for Mixture-of-Experts (MoE) architectures that adaptively selects experts based on token difficulty while maintaining controlled computational costs. The approach outperforms traditional Top-k routing and fixed Top-p methods by using a Proportional-Integral controller to dynamically adjust probability thresholds, demonstrating consistent improvements across large language models and diffusion transformers.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Researchers propose Manifold Power Iteration (MPI), a novel router redesign method for Mixture-of-Experts models that aligns router rows with principal singular directions of associated experts. The approach uses a "Power-then-Retract" paradigm and demonstrates improved MoE model effectiveness across scales from 1B to 11B parameters.

AINeutralarXiv – CS AI · May 296/10

🧠

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router

Researchers propose a mathematical model explaining how Mixture-of-Experts (MoE) neural networks can suddenly shift from balanced to imbalanced expert utilization. The model reveals a bifurcation mechanism where increased feedback strength triggers abrupt transitions between stable states, providing theoretical insight into a practical problem affecting large language models and distributed AI systems.

AINeutralarXiv – CS AI · May 286/10

🧠

Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

Researchers propose RA-MoE, a fine-tuning framework that optimizes Mixture-of-Experts language models for multilingual tasks by aligning target-language routing patterns with English task performance in middle layers. The approach outperforms standard fine-tuning across multiple models and languages, addressing a critical gap in adapting efficient LLM architectures for non-English downstream applications.