AIBullisharXiv – CS AI · 7h ago7/10
🧠
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
Researchers propose DeMix, a framework that uses model merging to efficiently determine optimal data mixtures for large language model pre-training without expensive repeated training cycles. The approach decouples the search process from training costs, enabling evaluation of multiple data combinations while also releasing a 22-token dataset to support open research.