AIBullisharXiv โ CS AI ยท 5d ago6/103
๐ง
PiKV: KV Cache Management System for Mixture of Experts
Researchers have introduced PiKV, an open-source KV cache management framework designed to optimize memory and communication costs for Mixture of Experts (MoE) language models across multi-GPU and multi-node inference. The system uses expert-sharded storage, intelligent routing, adaptive scheduling, and compression to improve efficiency in large-scale AI model deployment.