AIBullisharXiv – CS AI · 8h ago7/10
🧠
Geometry-Aware Online Scheduling for LLM Serving: From Theoretical Bound to System Practice
Researchers propose Geometry-Aware Online Scheduling, introducing the Smallest Volume First (SVF) algorithm to optimize LLM inference by accounting for dynamic memory footprint of Key-Value caches. The approach improves upon traditional time-centric scheduling heuristics, achieving significant reductions in latency and throughput gains when integrated into vLLM.
🧠 Llama