🧠 AI⚪ NeutralImportance 6/10

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

arXiv – CS AI|Runze Liu, Jiashun Liu, Xu Wan, Yuqian Fu, Ling Pan|June 10, 2026 at 04:00 AM

🤖AI Summary

Researchers identify a critical problem in LLM post-training where excessive Supervised Fine-Tuning (SFT) reduces model plasticity, limiting subsequent Reinforcement Learning (RL) effectiveness. They propose 'Rejuvenation,' a method combining base-anchored model fusion and targeted neuron reset to restore plasticity while preserving SFT knowledge, demonstrating improved RL performance on reasoning and agentic tasks.

Analysis

The research addresses a fundamental challenge in modern LLM development: the sequential SFT-to-RL pipeline has become industry standard, yet over-trained SFT models often fail to benefit from subsequent RL optimization. This phenomenon, termed loss of model plasticity, represents a critical bottleneck in achieving better post-training outcomes. The researchers provide empirical evidence that excessive SFT creates over-confident token distributions and sharp parameter landscapes that resist RL-based reshaping.

This work builds on growing recognition within the AI research community that model behavior during training involves complex trade-offs between memorization and adaptability. Previous approaches focused on balancing SFT intensity or RL algorithms themselves, but this study identifies the architectural and distributional changes underlying the problem. Understanding model plasticity has implications for how practitioners calibrate training pipelines and manage the transition between supervised and reinforcement learning phases.

For AI development teams and model builders, the Rejuvenation technique offers a practical solution that doesn't require fundamental architectural changes or expensive retraining from scratch. The method's effectiveness across both mathematical reasoning and agentic tasks suggests broad applicability. The demonstrated improvements on out-of-distribution generalization additionally indicate that rejuvenated models may be more robust to domain shifts, a critical consideration for production deployments.

Looking forward, this research may influence how organizations approach LLM post-training schedules and hyperparameter selection. Teams building state-of-the-art models will need to evaluate whether Rejuvenation or similar plasticity-preserving techniques should become standard practice, potentially affecting training timelines and computational requirements for competitive model development.

Key Takeaways

→Excessive SFT reduces model plasticity, preventing effective RL optimization through over-confident distributions and sharp loss landscapes
→Rejuvenation method combines base-anchored fusion with neuron reset to restore training adaptability while preserving learned priors
→The technique improves RL performance on over-trained models and enhances generalization to out-of-distribution tasks
→Model plasticity degradation is a previously underexplored failure mode in standard SFT-to-RL pipelines
→Finding suggests LLM post-training practices may need recalibration to balance knowledge acquisition with optimization flexibility

#llm-training #supervised-fine-tuning #reinforcement-learning #model-optimization #plasticity #post-training #neural-networks #training-methodology

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge