AINeutralarXiv – CS AI · 3h ago6/10
🧠
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Researchers present a novel framework analyzing how reinforcement learning (RL) and supervised fine-tuning (SFT) differently shape reasoning in large language models. The study reveals that RL compresses incorrect reasoning paths while SFT expands correct ones, explaining why the two-stage training approach produces superior reasoning capabilities across models of 1.5B to 14B parameters.