AIBullisharXiv โ CS AI ยท 5d ago7/103
๐ง
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
Researchers introduce FreeKV, a training-free optimization framework that dramatically improves KV cache retrieval efficiency for large language models with long context windows. The system achieves up to 13x speedup compared to existing methods while maintaining near-lossless accuracy through speculative retrieval and hybrid memory layouts.
$NEAR