AIBullisharXiv – CS AI · 18h ago7/10
🧠
Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads
Researchers have developed a method to improve multi-GPU machine learning training by enabling computation and communication to execute simultaneously using shared-memory allocation and scheduling priority adjustments. The technique demonstrates up to 25.5% execution time reduction across NVIDIA and AMD GPUs without requiring modifications to vendor libraries.
🏢 Nvidia