8 articles tagged with #distillation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 267/10
๐ง Researchers introduce Hybrid Distillation Policy Optimization (HDPO), a new method that improves large language model training for mathematical reasoning by addressing 'cliff prompts' where standard reinforcement learning fails. The technique uses privileged self-distillation to provide learning signals for previously unsolvable problems, showing measurable improvements in coverage metrics while maintaining accuracy.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers introduce MARVAL, a distillation framework that accelerates masked auto-regressive diffusion models by compressing inference into a single step while enabling practical reinforcement learning applications. The method achieves 30x speedup on ImageNet with comparable quality, making RL post-training feasible for the first time with these models.
AIBullisharXiv โ CS AI ยท Apr 76/10
๐ง Researchers have developed DP-OPD (Differentially Private On-Policy Distillation), a new framework for training privacy-preserving language models that significantly improves performance over existing methods. The approach simplifies the training pipeline by eliminating the need for DP teacher training and offline synthetic text generation while maintaining strong privacy guarantees.
๐ข Perplexity
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Researchers propose X-OPD, a Cross-Modal On-Policy Distillation framework to improve Speech Large Language Models by aligning them with text-based counterparts. The method uses token-level feedback from teacher models to bridge performance gaps in end-to-end speech systems while preserving inherent capabilities.
AIBullisharXiv โ CS AI ยท Mar 27/1024
๐ง Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.
AIBullisharXiv โ CS AI ยท Mar 27/1022
๐ง Researchers introduce EAGLE, a reinforcement learning framework that creates unified control policies for multiple different humanoid robots without per-robot tuning. The system uses iterative generalist-specialist distillation to enable a single AI controller to manage diverse humanoid embodiments and support complex behaviors beyond basic walking.
AINeutralarXiv โ CS AI ยท Mar 34/103
๐ง DistillKac introduces a new fast image generation method using damped wave equations and Kac representation for finite-speed probability transport. Unlike diffusion models with potentially unstable reverse-time velocities, this approach enforces bounded kinetic energy and offers improved numerical stability with fewer function evaluations.
AINeutralLil'Log (Lilian Weng) ยท Jan 105/10
๐ง Large transformer models face significant inference optimization challenges due to high computational costs and memory requirements. The article discusses technical factors contributing to inference bottlenecks that limit real-world deployment at scale.