AIBullisharXiv – CS AI · 7h ago7/10
🧠
Model Parallelism With Subnetwork Data Parallelism
Researchers introduce Subnetwork Data Parallelism (SDP), a distributed training framework that reduces memory consumption by 28-60% during neural network pre-training by partitioning models into structured subnetworks trained across workers without exchanging activations. The method supports both backward and forward masking regimes and maintains or improves performance across transformer and CNN architectures.