y0news
AnalyticsDigestsRSSAICrypto
#model-distillation1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO

Researchers developed a three-stage curriculum learning framework that improves Chain-of-Thought reasoning distillation from large language models to smaller ones. The method enables Qwen2.5-3B-Base to achieve 11.29% accuracy improvement while reducing output length by 27.4% through progressive skill acquisition and Group Relative Policy Optimization.