🧠 AI🟢 BullishImportance 6/10

OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration

arXiv – CS AI|Xinyue Ma, Heelim Hong, Taegeon Um, Jongseop Lee, Seoyeong Choy, Woo-Yeon Lee, Myeongjae Jeon|March 3, 2026 at 05:00 AM|4 views

🤖AI Summary

OrbitFlow is a new KV cache management system for long-context LLM serving that uses adaptive memory allocation and fine-grained optimization to improve performance. The system achieves up to 66% better SLO attainment and 3.3x higher throughput by dynamically managing GPU memory usage during token generation.

Key Takeaways

→OrbitFlow addresses memory management challenges in long-context LLM serving through adaptive KV cache placement decisions.
→The system uses a lightweight ILP solver to optimize which layers' KV caches remain on GPU within memory constraints.
→Performance improvements include up to 66% better TPOT SLO attainment and 48% better TBT SLO attainment.
→The system achieves 38% reduction in 95th percentile latency and up to 3.3x higher throughput compared to existing methods.
→OrbitFlow includes a fallback mechanism that temporarily defers high-memory requests during heavy load periods.