AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning
Researchers introduce SAI-DPO, a dynamic data sampling framework that adapts training data selection based on a model's evolving capabilities during training, rather than using static metrics. Tested on mathematical reasoning benchmarks including AIME24 and AMC23, SAI-DPO achieves state-of-the-art performance with significantly less training data, outperforming baselines by nearly 6 points.