y0news
AnalyticsDigestsSourcesRSSAICrypto
#slo1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 5d ago6/104
๐Ÿง 

OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration

OrbitFlow is a new KV cache management system for long-context LLM serving that uses adaptive memory allocation and fine-grained optimization to improve performance. The system achieves up to 66% better SLO attainment and 3.3x higher throughput by dynamically managing GPU memory usage during token generation.