AINeutralarXiv – CS AI · 9h ago6/10
🧠
Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
Researchers introduce Dr. Post-Training, a novel framework that treats general training data as a regularizer rather than a selection pool for LLM post-training. The method projects target-data updates onto a feasible set defined by general data, improving performance across SFT, RLHF, and RLVR tasks while maintaining computational efficiency.