Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation
Researchers propose a basis rotation framework to address gradient staleness in asynchronous pipeline parallelism, a technique used for distributed AI training. By aligning the optimizer's coordinate system with the Hessian eigenbasis, the method reduces training iterations by 81.7% compared to existing asynchronous baselines, enabling more efficient large-scale model training.