AIBullisharXiv โ CS AI ยท 7h ago7/10
๐ง
SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training
Researchers introduce SPICE, a data selection algorithm that reduces large language model training data requirements by 90% while maintaining performance by identifying and minimizing gradient conflicts between training samples. The method combines information-theoretic principles with practical efficiency improvements, enabling effective model tuning on just 10% of typical datasets across multiple benchmarks.