Proximal Supervised Fine-Tuning
Researchers propose Proximal Supervised Fine-Tuning (PSFT), a new method that applies trust-region constraints from reinforcement learning to improve how foundation models adapt to new tasks. The technique maintains model capabilities while fine-tuning, outperforming standard supervised fine-tuning on out-of-domain generalization tasks.