#expert-offloading Articles

#expert-offloading1 article

1 articles

AINeutralarXiv – CS AI · 5d ago7/104

🧠

Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

Researchers analyzed 20 Mixture-of-Experts (MoE) language models to study local routing consistency, finding a trade-off between routing consistency and local load balance. The study introduces new metrics to measure how well expert offloading strategies can optimize memory usage on resource-constrained devices while maintaining inference speed.