AIBullisharXiv โ CS AI ยท 7h ago7/10
๐ง
Efficient Training on Multiple Consumer GPUs with RoundPipe
Researchers introduce RoundPipe, a novel pipeline scheduling algorithm that enables efficient fine-tuning of large language models on consumer-grade GPUs by eliminating the weight binding constraint that causes computational bottlenecks. The system achieves 1.48-2.16x speedups over existing approaches and enables fine-tuning of models with up to 235 billion parameters on standard hardware.