SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization
Researchers introduce SimReg, an embedding similarity regularization technique for large language model pretraining that improves training efficiency by encouraging similar token representations to cluster together while separating different tokens. The approach achieves over 30% faster training convergence and 1% improvement in zero-shot performance across standard benchmarks.