y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#pre-training-efficiency News & Analysis

1 article tagged with #pre-training-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 18h ago6/10
🧠

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

Researchers identify that data mixture optimization for AI model pre-training fails at scale due to 'repetition mismatch'—when high-quality datasets are small, their repetition rates change as training budgets grow, invalidating small-scale experiments. A subsampling procedure that controls for target repetition rates enables accurate mixture prediction using only 1/16 of tokens versus traditional methods requiring 44-94% of the full budget.