AINeutralarXiv – CS AI · 6h ago6/10
🧠
DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts
Researchers propose DAG-MoE, a new Mixture-of-Experts architecture that improves large language model scaling by optimizing how expert outputs are aggregated rather than just increasing expert count. The framework uses structural aggregation instead of weighted summation, enabling multi-step reasoning within a single layer while reducing routing overhead and improving both pretraining and fine-tuning performance.