AINeutralarXiv – CS AI · 3h ago6/10
🧠
Worker Disagreement Reveals Sharp Directions in Local SGD
Researchers demonstrate that worker disagreement in Local SGD training reveals the underlying loss geometry of deep neural networks, providing a computationally efficient method to estimate dominant Hessian directions without expensive direct calculations. This finding has implications for optimizing distributed training of large models like Transformers.