AINeutralarXiv โ CS AI ยท 5d ago7/104
๐ง
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Researchers analyzed 20 Mixture-of-Experts (MoE) language models to study local routing consistency, finding a trade-off between routing consistency and local load balance. The study introduces new metrics to measure how well expert offloading strategies can optimize memory usage on resource-constrained devices while maintaining inference speed.