y0news
AnalyticsDigestsSourcesRSSAICrypto
#automated-evolution1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 7h ago7/10
๐Ÿง 

Data Darwinism Part II: DataEvolve -- AI can Autonomously Evolve Pretraining Data Curation

Researchers introduced DataEvolve, an AI framework that autonomously evolves data curation strategies for pretraining datasets through iterative optimization. The system processed 672B tokens to create Darwin-CC dataset, which achieved superior performance compared to existing datasets like DCLM and FineWeb-Edu when training 3B parameter models.