AIBullisharXiv โ CS AI ยท 5d ago6/104
๐ง
OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration
OrbitFlow is a new KV cache management system for long-context LLM serving that uses adaptive memory allocation and fine-grained optimization to improve performance. The system achieves up to 66% better SLO attainment and 3.3x higher throughput by dynamically managing GPU memory usage during token generation.