#sparse-activation News & Analysis

2 articles tagged with #sparse-activation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts

Researchers introduce ProbMoE, a probabilistic routing framework that solves a fundamental challenge in training Mixture-of-Experts models by replacing discrete, non-differentiable top-k routing with a differentiable probabilistic approach. The method achieves comparable or improved performance while enabling dynamic expert allocation and better expert utilization across various benchmarks.

AIBullisharXiv – CS AI · May 277/10

🧠

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax introduces the M2 series, a Mixture-of-Experts language model with 229.9B total parameters but only 9.8B activated per token, achieving frontier-tier performance on agentic tasks through agent-driven data pipelines and a custom reinforcement learning system called Forge. The M2.7 checkpoint demonstrates early self-evolution capabilities, autonomously debugging and modifying its own training scaffold.