#moe-architecture News & Analysis

8 articles tagged with #moe-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · 4d ago7/10

🧠

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek released V4, a new series of efficient mixture-of-experts language models supporting one-million-token context windows. The models achieve significant computational improvements over predecessors while maintaining state-of-the-art performance, with V4-Pro requiring only 27% of the inference compute of DeepSeek-V3.2.

🏢 Hugging Face

AIBullisharXiv – CS AI · Jun 97/10

🧠

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Researchers introduced ZEDA, a framework that converts fully-trained Mixture-of-Experts language models into dynamic variants capable of skipping unnecessary experts, reducing computational requirements by over 50% with minimal accuracy loss. The method uses self-distillation to adapt post-trained models without retraining from scratch, achieving ~1.20x end-to-end inference speedup on major language models.

AIBullisharXiv – CS AI · May 97/10

🧠

ZAYA1-8B Technical Report

Zyphra has unveiled ZAYA1-8B, a compact reasoning-focused AI model with only 700M active parameters that matches larger competitors like DeepSeek-R1 on mathematics and coding tasks. The model introduces Markovian RSA, a novel test-time compute method that achieves 91.9% on AIME'25 benchmarks while maintaining computational efficiency, suggesting small models can compete with much larger reasoning systems through architectural innovation.

🧠 GPT-5🧠 Gemini

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Researchers demonstrate that calibration—aligning model confidence with actual accuracy—behaves differently in mixture-of-experts (MoE) models depending on routing mechanisms. While expert-level calibration suffices for hard-routed models under distribution shift, soft-routed models require additional adversarial reweighting techniques to maintain both accuracy and calibration reliability.

AINeutralarXiv – CS AI · Jun 26/10

🧠

DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts

Researchers propose DAG-MoE, a new Mixture-of-Experts architecture that improves large language model scaling by optimizing how expert outputs are aggregated rather than just increasing expert count. The framework uses structural aggregation instead of weighted summation, enabling multi-step reasoning within a single layer while reducing routing overhead and improving both pretraining and fine-tuning performance.

AIBullishHugging Face Blog · Jun 16/10

🧠

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains has unveiled Mellum2, a 12 billion parameter Mixture-of-Experts (MoE) language model that represents a significant advancement in open-source AI development. The model demonstrates competitive performance with larger models while maintaining computational efficiency, reflecting the broader industry trend toward optimized transformer architectures.

AINeutralarXiv – CS AI · May 126/10

🧠

SDG-MoE: Signed Debate Graph Mixture-of-Experts

Researchers introduce SDG-MoE, a novel mixture-of-experts architecture that enables deliberation among routed experts through signed graph communication before output aggregation. The model demonstrates 19.8% perplexity improvement over vanilla MoE and achieves state-of-the-art results on multiple language modeling benchmarks while maintaining computational efficiency.

🏢 Perplexity

AINeutralarXiv – CS AI · May 115/10

🧠

Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs

ENGINEERING Ingegneria Informatica has released EngGPT2MoE-16B-A3B, a 16-billion parameter Mixture of Experts language model that demonstrates competitive or superior performance compared to Italian and international open-source LLMs across multiple benchmarks. The model represents a notable advancement for Italian-language AI capabilities while positioning itself competitively within the global open-source LLM landscape.

🧠 GPT-5🧠 Llama