AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers propose Early Stopping Rollout (ESR), a novel distillation technique that improves on-policy student model training by limiting rollout generation to initial response tokens. The method addresses "Off-policy Teacher Decay," where teachers lose effectiveness on later tokens, achieving better performance with higher GPU efficiency than standard approaches.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce ROPD, a rubric-based on-policy distillation framework that replaces teacher logits with structured semantic rubrics for model alignment. The approach achieves up to 10x better sample efficiency than logit-based methods while enabling distillation from proprietary black-box LLMs, addressing a critical scalability limitation in current model training.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce SOD (Step-wise On-policy Distillation), a framework that improves small language models' ability to use tools and reason through complex tasks by adaptively controlling how much they learn from larger teacher models at each step. The approach achieves up to 20.86% improvement over existing methods and demonstrates that a 0.6B parameter model can reach 26.13% accuracy on AIME 2025, a significant benchmark for mathematical reasoning.
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers introduce Hybrid Distillation Policy Optimization (HDPO), a new method that improves large language model training for mathematical reasoning by addressing 'cliff prompts' where standard reinforcement learning fails. The technique uses privileged self-distillation to provide learning signals for previously unsolvable problems, showing measurable improvements in coverage metrics while maintaining accuracy.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce MARVAL, a distillation framework that accelerates masked auto-regressive diffusion models by compressing inference into a single step while enabling practical reinforcement learning applications. The method achieves 30x speedup on ImageNet with comparable quality, making RL post-training feasible for the first time with these models.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce CA-DSSL, a new self-supervised learning technique that enables efficient AI model training on microcontrollers with under 500K parameters. The method surpasses existing approaches by 18 percentage points on standard benchmarks while requiring significantly fewer parameters, achieving 94% of supervised learning performance with models deployable in just 378 KB of memory.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers introduce Budgeted LoRA, a distillation framework that compresses large language models by treating model compression as a structured compute allocation problem. The method achieves up to 4.05x speedup in inference through selective dense component removal and adaptive low-rank allocation, controlled by a single compute budget parameter.
🏢 Perplexity
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers have developed DP-OPD (Differentially Private On-Policy Distillation), a new framework for training privacy-preserving language models that significantly improves performance over existing methods. The approach simplifies the training pipeline by eliminating the need for DP teacher training and offline synthetic text generation while maintaining strong privacy guarantees.
🏢 Perplexity
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers propose X-OPD, a Cross-Modal On-Policy Distillation framework to improve Speech Large Language Models by aligning them with text-based counterparts. The method uses token-level feedback from teacher models to bridge performance gaps in end-to-end speech systems while preserving inherent capabilities.
AIBullisharXiv – CS AI · Mar 27/1024
🧠Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.
AIBullisharXiv – CS AI · Mar 27/1022
🧠Researchers introduce EAGLE, a reinforcement learning framework that creates unified control policies for multiple different humanoid robots without per-robot tuning. The system uses iterative generalist-specialist distillation to enable a single AI controller to manage diverse humanoid embodiments and support complex behaviors beyond basic walking.
AINeutralarXiv – CS AI · Mar 34/103
🧠DistillKac introduces a new fast image generation method using damped wave equations and Kac representation for finite-speed probability transport. Unlike diffusion models with potentially unstable reverse-time velocities, this approach enforces bounded kinetic energy and offers improved numerical stability with fewer function evaluations.
AINeutralLil'Log (Lilian Weng) · Jan 105/10
🧠Large transformer models face significant inference optimization challenges due to high computational costs and memory requirements. The article discusses technical factors contributing to inference bottlenecks that limit real-world deployment at scale.