🧠 AI⚪ NeutralImportance 6/10

SDG-MoE: Signed Debate Graph Mixture-of-Experts

arXiv – CS AI|Stepan Kulibaba, Kirill Labzin, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gansnikov, Aleksei Shpilman|May 12, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce SDG-MoE, a novel mixture-of-experts architecture that enables deliberation among routed experts through signed graph communication before output aggregation. The model demonstrates 19.8% perplexity improvement over vanilla MoE and achieves state-of-the-art results on multiple language modeling benchmarks while maintaining computational efficiency.

Analysis

SDG-MoE addresses a fundamental limitation in sparse mixture-of-experts models: the lack of communication among selected experts during inference. Traditional MoE systems route tokens to specific experts, which then process independently before weighted combination. This research proposes adding an iterative deliberation phase where experts influence each other through learned interaction matrices representing both reinforcing (support) and corrective (critique) relationships.

The architecture's innovation centers on three mechanisms: dual signed graphs capturing different interaction types, message-passing that updates expert representations based on these interactions, and a disagreement-gated anchoring mechanism that prevents expert drift while scaling deliberation intensity with disagreement levels. This design maintains computational efficiency by operating only on the active expert subset, adding minimal overhead compared to vanilla MoE.

The theoretical framework establishes stability conditions for expert states during deliberation, providing mathematical grounding for the approach. Empirical results across three independent pretraining runs show consistent improvements, with the model achieving best-in-class perplexity on WikiText-103, C4, and Paloma benchmarks—established evaluation standards in the field.

The implications extend beyond pure performance metrics. By enabling structured expert communication while preserving specialization, SDG-MoE suggests that expert collaboration mechanisms could unlock better language model efficiency. This approach bridges two competing design philosophies: the modularity benefits of MoE and the collaborative intelligence of dense models. For practitioners building large-scale language systems, this demonstrates that thoughtful inter-expert communication protocols may yield significant improvements without proportional computational costs.

Key Takeaways

→SDG-MoE enables expert deliberation through signed graph communication, improving validation perplexity by 19.8% over standard MoE baselines.
→The architecture uses separate support and critique graphs to model reinforcing and corrective expert interactions in a structured way.
→Disagreement-gated anchoring prevents expert drift while scaling deliberation intensity based on consensus levels among experts.
→Theoretical analysis confirms stability of expert states and demonstrates that communication overhead scales only with the active expert subset.
→Model achieves state-of-the-art perplexity results on WikiText-103, C4, and Paloma benchmarks in controlled pretraining experiments.

Mentioned in AI

Companies

Perplexity→