AIBullisharXiv โ CS AI ยท Feb 277/106
๐ง
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Researchers propose Supervised Reinforcement Learning (SRL), a new training framework that helps small-scale language models solve complex multi-step reasoning problems by generating internal reasoning monologues and providing step-wise rewards. SRL outperforms traditional Supervised Fine-Tuning and Reinforcement Learning approaches, enabling smaller models to tackle previously unlearnable problems.