y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#gpu-utilization News & Analysis

2 articles tagged with #gpu-utilization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv โ€“ CS AI ยท Apr 146/10
๐Ÿง 

Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

Researchers present the first systematic study of performance-energy trade-offs in multi-request LLM inference workflows, using NVIDIA A100 GPUs and vLLM/Parrot serving systems. The study identifies batch size as the most impactful optimization lever, though effectiveness varies by workload type, and reveals that workflow-aware scheduling can reduce energy consumption under power constraints.

๐Ÿข Nvidia
AIBullisharXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

MoEless: Efficient MoE LLM Serving via Serverless Computing

Researchers introduce MoEless, a serverless framework for serving Mixture-of-Experts Large Language Models that addresses expert load imbalance issues. The system reduces inference latency by 43% and costs by 84% compared to existing solutions by using predictive load balancing and optimized expert scaling strategies.