AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง
Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation
Researchers developed a two-stage framework to optimize large reasoning models, reducing overthinking on simple queries while maintaining accuracy on complex problems. The approach achieved up to 3.7 accuracy point improvements while reducing token generation by over 40% through hybrid fine-tuning and adaptive reinforcement learning techniques.