y0news
AnalyticsDigestsSourcesRSSAICrypto
#expert-offloading1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 5d ago7/104
๐Ÿง 

Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

Researchers analyzed 20 Mixture-of-Experts (MoE) language models to study local routing consistency, finding a trade-off between routing consistency and local load balance. The study introduces new metrics to measure how well expert offloading strategies can optimize memory usage on resource-constrained devices while maintaining inference speed.