7 articles tagged with #supervised-fine-tuning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers propose a new framework called On-Policy SFT that bridges the performance gap between supervised fine-tuning and reinforcement learning in AI model training. The framework introduces Distribution Discriminant Theory (DDT) and two techniques - In-Distribution Finetuning and Hinted Decoding - that achieve better generalization while maintaining computational efficiency.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed a new AI training method using knowledge graphs as reward models to improve compositional reasoning in specialized domains. The approach enables smaller 14B parameter models to outperform much larger frontier systems like GPT-5.2 and Gemini 3 Pro on complex multi-hop reasoning tasks in medicine.
๐ง Gemini
AINeutralarXiv โ CS AI ยท Mar 47/104
๐ง Researchers introduce GraphSSR, a new framework that improves zero-shot graph learning by combining Large Language Models with adaptive subgraph denoising. The system addresses structural noise issues in existing methods through a dynamic 'Sample-Select-Reason' pipeline and reinforcement learning training.
AINeutralarXiv โ CS AI ยท Feb 277/107
๐ง Researchers propose a new approach for training AI models to generate correct answers from demonstrations, using imitation learning in contextual bandits rather than traditional supervised fine-tuning. The method achieves better sample complexity and works with weaker assumptions about the underlying reward model compared to existing likelihood-maximization approaches.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers present a layer-wise analysis of Supervised Fine-Tuning (SFT) in large language models, revealing that middle layers remain stable during training while final layers exhibit high sensitivity. They introduce Mid-Block Efficient Tuning, a targeted approach that selectively updates intermediate layers and achieves up to 10.2% performance gains over standard LoRA on benchmarks with significantly reduced parameter overhead.
AINeutralarXiv โ CS AI ยท Mar 176/10
๐ง A comprehensive research study examines the relationship between Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) methods for improving Large Language Models after pre-training. The research identifies emerging trends toward hybrid post-training approaches that combine both methods, analyzing applications from 2023-2025 to establish when each method is most effective.
AINeutralarXiv โ CS AI ยท Mar 36/108
๐ง New theoretical research analyzes how Large Language Models learn during pretraining versus post-training phases, revealing that balanced pretraining data creates latent capabilities activated later, while supervised fine-tuning works best on small, challenging datasets and reinforcement learning requires large-scale data that isn't overly difficult.