AINeutralarXiv – CS AI · 8h ago6/10
🧠
LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling
Researchers introduce LoopMoE, a language model architecture combining Mixture-of-Experts sparse routing with iterative weight-sharing computation. The model outperforms standard MoE baselines at 3B and 9B scales while maintaining identical parameter budgets and computational costs, suggesting recurrent architectures offer efficiency gains beyond parameter scaling.