#mixture-of-experts News & Analysis

130 articles tagged with #mixture-of-experts. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

130 articles

AINeutralarXiv – CS AI · Jun 196/10

🧠

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Researchers demonstrate that calibration—aligning model confidence with actual accuracy—behaves differently in mixture-of-experts (MoE) models depending on routing mechanisms. While expert-level calibration suffices for hard-routed models under distribution shift, soft-routed models require additional adversarial reweighting techniques to maintain both accuracy and calibration reliability.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training

Researchers propose Multi-Rate Mixture-of-Experts (MR-MoE), a framework that enhances Liquid Neural Networks for time-series modeling by deploying multiple experts operating at different time scales with adaptive gating. The approach combines continuous-time dynamics, multi-scale decomposition, and attention mechanisms to outperform traditional RNNs and monolithic LNNs on complex multivariate time-series tasks.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Researchers propose Manifold Power Iteration (MPI), a novel router redesign method for Mixture-of-Experts models that aligns router rows with principal singular directions of associated experts. The approach uses a "Power-then-Retract" paradigm and demonstrates improved MoE model effectiveness across scales from 1B to 11B parameters.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Resource-Aware LLM Reasoning for Mobile Edge General Intelligence

Researchers propose a joint optimization framework for deploying large language model reasoning on resource-constrained edge devices, combining adaptive chain-of-thought prompting with distributed mixture-of-experts architecture. The framework dynamically balances reasoning quality and computational efficiency by treating reasoning depth as an optimizable network resource, achieving 90% accuracy and latency satisfaction with minimal inference overhead.

AINeutralarXiv – CS AI · Jun 106/10

🧠

MoE Enhanced Federated Learning for Spatiotemporal Prediction

Researchers propose MoE-FedTP, a federated learning framework using Mixture-of-Experts networks to improve traffic prediction across cities while preserving privacy. The system enables data-rich cities to share knowledge with data-scarce regions by dynamically fusing expert networks tailored to different urban environments, achieving superior accuracy without centralized data collection.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Mixtures of Neural Operators Reduce Active Complexity in Operator Learning

Researchers demonstrate that mixtures of neural operators (MoNOs) reduce computational complexity in operator learning by routing inputs through expert models rather than using a single large model. The approach achieves better scaling properties with depth, width, and rank while maintaining approximation quality, with implications for efficient AI system design.

AINeutralarXiv – CS AI · Jun 106/10

🧠

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

Researchers introduce LongMoE, a machine learning framework designed to improve clinical AI systems by simultaneously handling missing patient data and tracking disease progression over time. The model combines mixture-of-experts routing with temporal pattern recognition, demonstrating improvements across major medical datasets (ADNI, OASIS-3, MIMIC-IV).

AINeutralarXiv – CS AI · Jun 106/10

🧠

Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models

Researchers propose TRACE, a novel machine unlearning technique designed specifically for Mixture-of-Experts language models that addresses the problem of forget-critical experts receiving insufficient regularization during the unlearning process. The method achieves 9% relative utility improvements by detecting and calibrating expert activation patterns to match forget and retain data distributions, demonstrating consistent performance gains across multiple MoE architectures.

AIBullisharXiv – CS AI · Jun 96/10

🧠

FAME: Forecastability-Aware Mixture of Experts for Heterogeneous Time Series Forecasting

Researchers introduce FAME, a sparse mixture-of-experts framework that dynamically routes time series forecasting tasks to specialized models based on data characteristics. Tested on a production retail dataset with 5,000+ vending machines, the system achieves 12.4% MSE improvement over single-model baselines while using only 1.92 experts per series, demonstrating practical advantages for large-scale commercial forecasting systems.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

Researchers introduce SETA, a machine learning framework that addresses catastrophic forgetting in large language models through sparse expert decomposition. The method separates task-specific and shared knowledge into distinct expert modules, enabling models to retain previous capabilities while learning new ones—a fundamental challenge in continual AI development.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models

Researchers propose VSRAQ, a quantization technique designed specifically for Mixture-of-Experts models that prevents routing instability during model compression. By preserving expert-selection behavior through value and structure alignment, the method enables efficient deployment of large MoE models without quality degradation.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

Researchers propose a sparse Mixture-of-Experts (MoE) reward model that learns interpretable, specialized experts for modeling diverse human preferences in RLHF systems. By encouraging sparse routing during training on binary preference data, the approach improves both interpretability and personalization capabilities compared to universal reward function models.

AINeutralarXiv – CS AI · Jun 46/10

🧠

LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling

Researchers introduce LoopMoE, a language model architecture combining Mixture-of-Experts sparse routing with iterative weight-sharing computation. The model outperforms standard MoE baselines at 3B and 9B scales while maintaining identical parameter budgets and computational costs, suggesting recurrent architectures offer efficiency gains beyond parameter scaling.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Treat Traffic Like Trees: A Semantic-Preserving Hierarchical Graph-Based Expert Framework for Encrypted Traffic Analysis

Researchers propose PTGAMoE, a semantic-preserving graph-based deep learning framework for encrypted traffic analysis that outperforms existing models by respecting protocol hierarchies and field-level structures. The approach combines graph attention mechanisms with mixture-of-experts design to improve both accuracy in traffic classification and interpretability of model decisions.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

Researchers propose Task-Aware Coactivation Grouping (TACG), a framework for optimizing Mixture-of-Experts (MoE) model inference across distributed GPUs by grouping experts based on task-specific activation patterns rather than global averages. The approach reduces communication costs by 31.39% while maintaining load balance, addressing a critical efficiency bottleneck in multi-task AI serving.

AINeutralarXiv – CS AI · Jun 26/10

🧠

GC-MoE: Genomics-Guided Cell-Type-Specific Mixture of Experts for Histology-Based Single-Cell Spatial Transcriptomics

Researchers introduce GC-MoE, a machine learning framework that predicts individual cell gene expression from histopathology images and spatial data, addressing limitations of existing methods that only work at the spot level. The approach combines cell-type-specific expert models with genomic guidance to capture cellular expression variability more accurately than current baselines.

AINeutralarXiv – CS AI · Jun 26/10

🧠

EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion

Researchers introduce EMoE, a training-free method that leverages expert disagreement within mixture-of-experts diffusion models to estimate uncertainty in text-to-image generation. The approach measures variance among expert pathways after a single denoising step, enabling early detection of poorly aligned prompts without additional training or auxiliary networks.

AINeutralarXiv – CS AI · Jun 26/10

🧠

DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts

Researchers propose DAG-MoE, a new Mixture-of-Experts architecture that improves large language model scaling by optimizing how expert outputs are aggregated rather than just increasing expert count. The framework uses structural aggregation instead of weighted summation, enabling multi-step reasoning within a single layer while reducing routing overhead and improving both pretraining and fine-tuning performance.

AINeutralarXiv – CS AI · Jun 25/10

🧠

Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts

Researchers introduce DEFT, a new deep reinforcement learning architecture using a mixture-of-experts approach to optimize cloud workflow scheduling with varying deadline constraints. The system uses a graph-adaptive gating mechanism to route scheduling decisions through specialized experts, demonstrating improved performance in reducing execution costs and deadline violations compared to existing DRL baselines.

AIBullisharXiv – CS AI · Jun 26/10

🧠

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models

Researchers introduce AsyMoE, a novel Mixture of Experts architecture for Large Vision-Language Models that explicitly addresses the asymmetrical processing of visual and linguistic data. The approach uses hyperbolic geometry for hierarchical relationships and evidence-priority mechanisms to improve accuracy by up to 3.8% on hallucination-sensitive tasks while reducing parameter activation by 25.45% compared to dense models.

AINeutralarXiv – CS AI · Jun 26/10

🧠

MoEIoU: Rethinking Bounding-Box Regression as a Mixture of Experts

Researchers introduce MoEIoU, a novel machine learning approach that reformulates bounding-box regression for object detection using a mixture-of-experts framework. The method dynamically balances multiple localization objectives during training, outperforming existing solutions across standard benchmarks and architectures.

AIBullishHugging Face Blog · Jun 16/10

🧠

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains has unveiled Mellum2, a 12 billion parameter Mixture-of-Experts (MoE) language model that represents a significant advancement in open-source AI development. The model demonstrates competitive performance with larger models while maintaining computational efficiency, reflecting the broader industry trend toward optimized transformer architectures.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

Researchers propose GC-MoE, a graph-conditioned mixture of experts framework that improves traffic forecasting by assigning specialized neural network experts to different road segments based on graph topology. The approach trains only 17K parameters while leveraging 1.5M frozen expert weights, achieving competitive results across four standard traffic prediction benchmarks.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Routing on the Stiefel Manifold: When Does Adaptive Subspace Selection Help for Cross-Domain EEG Decoding?

Researchers propose dynamic Stiefel routing, a novel machine learning approach using expert projection filters on the Stiefel manifold to improve cross-domain EEG decoding without requiring target-domain calibration data. The method addresses a fundamental degeneracy problem where naive routing collapses to ensemble averaging, introducing three structural properties that enable genuine domain-specialized routing with significant accuracy improvements across datasets.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Mixture of Concept Bottleneck Experts

Researchers introduce Mixture of Concept Bottleneck Experts (M-CBE), a framework that enhances interpretable AI by allowing multiple expert expressions to map concepts to predictions rather than a single predetermined function. The approach combines Linear M-CBE and Symbolic M-CBE variants to improve both accuracy and adaptability while maintaining human-understandable decision-making processes.

← PrevPage 3 of 6Next →