AINeutralarXiv – CS AI · 6h ago6/10
🧠
A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design
Researchers propose a new framework for supervised fine-tuning (SFT) of language models that reinterprets the training process as target distribution design rather than simple token likelihood maximization. The Q-target framework allows models to allocate probability mass flexibly across token alternatives, unifying existing SFT variants and demonstrating consistent performance improvements across reasoning tasks.