#model-parallelism News & Analysis

2 articles tagged with #model-parallelism. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

FoMoE introduces a distributed training system that breaks the full-model replication requirement in Mixture-of-Experts (MoE) architectures by partitioning experts across workers. The approach achieves up to 1.42x communication cost reduction and 45x improvement over traditional distributed training, enabling efficient LLM pre-training across geographically dispersed commodity hardware.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Model Parallelism With Subnetwork Data Parallelism

Researchers introduce Subnetwork Data Parallelism (SDP), a distributed training framework that reduces memory consumption by 28-60% during neural network pre-training by partitioning models into structured subnetworks trained across workers without exchanging activations. The method supports both backward and forward masking regimes and maintains or improves performance across transformer and CNN architectures.