AIBullisharXiv โ CS AI ยท 15h ago6/10
๐ง
MoEless: Efficient MoE LLM Serving via Serverless Computing
Researchers introduce MoEless, a serverless framework for serving Mixture-of-Experts Large Language Models that addresses expert load imbalance issues. The system reduces inference latency by 43% and costs by 84% compared to existing solutions by using predictive load balancing and optimized expert scaling strategies.