RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention
RedKnot is a new KV cache management system for large language models that optimizes memory efficiency by treating cache differently across attention heads rather than as a uniform block. This head-aware approach enables better resource utilization, higher serving concurrency, and improved scalability without requiring model retraining.