y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#resource-efficiency News & Analysis

2 articles tagged with #resource-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv โ€“ CS AI ยท Mar 47/102
๐Ÿง 

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

Researchers propose SUN (Shared Use of Next-token Prediction), a novel approach for multi-LLM serving that enables cross-model sharing of decode execution by decomposing transformers into separate prefill and decode modules. The system achieves up to 2.0x throughput improvement per GPU while maintaining accuracy comparable to full fine-tuning, with a quantized version (QSUN) providing additional 45% speedup.

AIBullisharXiv โ€“ CS AI ยท Mar 266/10
๐Ÿง 

APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs

Researchers propose APreQEL, an adaptive mixed precision quantization method for deploying large language models on edge devices. The approach optimizes memory, latency, and accuracy by applying different quantization levels to different layers based on their importance and hardware characteristics.