AIBullisharXiv โ CS AI ยท 11h ago7/10
๐ง
SkillFactory: Self-Distillation For Learning Cognitive Behaviors
SkillFactory is a novel fine-tuning method that enables language models to learn cognitive behaviors like verification and backtracking without requiring distillation from stronger models. The approach uses self-rearranged training samples during supervised fine-tuning to prime models for subsequent reinforcement learning, resulting in better generalization and robustness.