#sparse-routing News & Analysis

3 articles tagged with #sparse-routing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AINeutralarXiv – CS AI · 10h ago6/10

🧠

Hierarchical Mixture-of-Experts with Two-Stage Optimization

Researchers introduce Hi-MoE, a hierarchical Mixture-of-Experts framework that addresses a fundamental routing trade-off in sparse MoE models by implementing two-stage optimization: inter-group load balancing and intra-group expert specialization. Tested on large-scale NLP and vision tasks, Hi-MoE achieves 5.6% perplexity improvements and superior expert balance compared to existing methods.

🏢 Meta🏢 Perplexity

AINeutralarXiv – CS AI · 10h ago6/10

🧠

Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers

Researchers studying one-layer Transformers discovered that architectural choices in feedforward networks (FFNs)—particularly sparse mixture-of-experts (MoE) routing—fundamentally reshape how attention mechanisms learn to compute, with sparsity rather than learned specialization driving this computational redistribution.

AINeutralarXiv – CS AI · 10h ago6/10

🧠

Mixture of Layers with Hybrid Attention

Researchers introduce Mixture of Layers (MoL), a novel architecture that extends Mixture-of-Experts concepts from individual experts to entire transformer blocks, using parallel thin blocks with learned routing. The approach incorporates hybrid attention combining global softmax with linear attention to address token coverage limitations in sparse routing systems.