y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#data-loading News & Analysis

2 articles tagged with #data-loading. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training

Researchers developed MegaScale-Data, an industrial-grade distributed data loading architecture that significantly improves training efficiency for large foundation models using multiple data sources. The system achieves up to 4.5x training throughput improvement and 13.5x reduction in CPU memory usage through disaggregated preprocessing and centralized data orchestration.

AIBullisharXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

GetBatch: Distributed Multi-Object Retrieval for ML Data Loading

Researchers introduce GetBatch, a new object store API that optimizes machine learning data loading by replacing thousands of individual GET requests with a single batch operation. The system achieves up to 15x throughput improvement for small objects and reduces batch retrieval latency by 2x in production ML training workloads.