AIBullisharXiv – CS AI · 10h ago6/10
🧠
KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving
Researchers present KV-RM, a runtime optimization that manages KV-cache memory movement in static-graph LLM decoders, achieving better throughput and reduced latency variability without sacrificing the predictability benefits of static graph execution. The approach decouples logical KV histories from physical storage through a block pager and merge-staged transport mechanism, demonstrating practical improvements on multi-GPU systems.
🏢 Nvidia