AINeutralarXiv – CS AI · 6h ago6/10
🧠
Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining
Researchers demonstrate that synthetic data composition significantly impacts foundation model pretraining for time series forecasting, with a 2× performance gap between best and worst generators. Rather than selecting individual generators, an equal-weight mixture of all generators consistently outperforms individual choices across different model architectures, suggesting corpus composition is more critical than generator selection.