#moe-models News & Analysis

6 articles tagged with #moe-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

MESA: Improving MoE Safety Alignment via Decentralized Expertise

Researchers propose MESA, a new safety alignment framework for Mixture-of-Experts language models that addresses a critical vulnerability where safety capabilities concentrate in few experts. The method uses Optimal Transport theory to strategically distribute safety responsibilities across multiple experts while maintaining model performance and computational efficiency.

AIBullisharXiv – CS AI · May 287/10

🧠

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Researchers present a systematic study of Attention-FFN Disaggregation (AFD), a technique that separates attention and expert layers across different GPU groups to optimize inference serving for Mixture-of-Experts language models. The framework demonstrates that AFD enables 4k tokens/s throughput on DeepSeek-V3.2 under strict latency constraints where traditional disaggregation approaches fail, providing design principles for scaling LLM infrastructure.

AIBullisharXiv – CS AI · May 127/10

🧠

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

Researchers present SlimQwen, a systematic study of compression techniques for mixture-of-experts (MoE) language models during pretraining. The work demonstrates that pruning pretrained MoE models outperforms training smaller architectures from scratch, and proposes progressive pruning combined with knowledge distillation as the most effective compression strategy, successfully compressing Qwen3-Next-80A3B to 23A2B while maintaining competitive performance.

AIBullisharXiv – CS AI · Apr 67/10

🧠

Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

Researchers analyzed data movement patterns in large-scale Mixture of Experts (MoE) language models (200B-1000B parameters) to optimize inference performance. Their findings led to architectural modifications achieving 6.6x speedups on wafer-scale GPUs and up to 1.25x improvements on existing systems through better expert placement algorithms.

🏢 Hugging Face

AIBullisharXiv – CS AI · Apr 76/10

🧠

Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation

Researchers discovered that multilingual MoE AI models exhibit 'Language Routing Isolation,' where high and low-resource languages activate different expert sets. They developed RISE, a framework that exploits this isolation to improve low-resource language performance by up to 10.85% F1 score while preserving other language capabilities.

AIBullisharXiv – CS AI · Mar 27/1018

🧠

Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling

Researchers propose Semantic Parallelism, a new framework called Sem-MoE that significantly improves efficiency of large language model inference by optimizing how AI models distribute computational tasks across multiple devices. The system reduces communication overhead between devices by 'collocating' frequently-used model components with their corresponding data, achieving superior throughput compared to existing solutions.