AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง
Training Language Models via Neural Cellular Automata
Researchers developed a method using neural cellular automata (NCA) to generate synthetic data for pre-training language models, achieving up to 6% improvement in downstream performance with only 164M synthetic tokens. This approach outperformed traditional pre-training on 1.6B natural language tokens while being more computationally efficient and transferring well to reasoning benchmarks.