AIBullisharXiv – CS AI · 18h ago7/10
🧠
FlashCP: Load-Balanced Communication-Efficient Context Parallelism for LLM Training
FlashCP is a new framework that improves context parallelism for training large language models by addressing workload imbalance and inefficient communication. The approach introduces load-balanced sharding strategies and eliminates redundant key-value tensor communication, delivering up to 1.63x speedup over existing methods.