AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers propose Guided Denoiser Self-Distillation (GDSD), a new reinforcement learning method for diffusion language models that eliminates the need for evidence lower bound approximations, achieving up to 19.6% performance improvements over existing approaches on planning, math, and coding tasks.
AIBullisharXiv – CS AI · 5d ago7/10
🧠Search-E1 introduces a simplified self-evolution method for search-augmented reasoning agents that achieves competitive performance through vanilla GRPO and self-distillation, without external supervision or complex auxiliary systems. The approach reaches 0.440 average EM on QA benchmarks with Qwen2.5-3B, demonstrating that elaborate post-training machinery may be unnecessary for effective agent development.
AINeutralarXiv – CS AI · Apr 207/10
🧠Researchers identify that supervised fine-tuning of large language models increases hallucinations by degrading pre-existing knowledge through semantic interference. The study proposes self-distillation and parameter freezing techniques to mitigate this problem while preserving task performance.
AIBullisharXiv – CS AI · Apr 137/10
🧠SkillFactory is a novel fine-tuning method that enables language models to learn cognitive behaviors like verification and backtracking without requiring distillation from stronger models. The approach uses self-rearranged training samples during supervised fine-tuning to prime models for subsequent reinforcement learning, resulting in better generalization and robustness.
AIBullisharXiv – CS AI · Mar 277/10
🧠Researchers developed GoldiCLIP, a data-efficient vision-language model that achieves state-of-the-art performance using only 30 million images - 300x less data than leading methods. The framework combines three key innovations including text-conditioned self-distillation, VQA-integrated encoding, and uncertainty-based loss weighting to significantly improve image-text retrieval tasks.
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers propose MTP-D, a self-distillation method that improves Multi-Token Prediction for Large Language Models, achieving 7.5% better acceptance rates and up to 220% inference speedup. The technique addresses key challenges in training multiple prediction heads while preserving main model performance.
AIBullisharXiv – CS AI · Mar 167/10
🧠Researchers developed a new method for training AI language models using multi-turn user conversations through self-distillation, leveraging follow-up messages to improve model alignment. Testing on real-world WildChat conversations showed improvements in alignment and instruction-following benchmarks while enabling personalization without explicit feedback.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce OISD, a new reinforcement learning framework that improves language model reasoning by having the final layer act as an internal teacher to guide intermediate layers through logit and attention alignment. The method demonstrates consistent improvements across mathematical reasoning tasks without requiring external data.
AIBullisharXiv – CS AI · 3d ago6/10
🧠Researchers introduce TRACER, a novel finetuning method for multimodal AI models that addresses catastrophic forgetting and out-of-distribution robustness degradation. By replacing standard Exponential Moving Average teachers with Weighted Moving Average teachers and combining contrastive learning with multi-perspective distillation, the approach demonstrates consistent performance gains across CLIP backbone architectures without hyperparameter sensitivity.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose SC-SDPO, an improved machine learning technique that enhances how large language models learn from their own feedback during training. By weighting training examples based on question difficulty, the method achieves 3-4% performance gains on reasoning benchmarks while maintaining stable training dynamics.
AIBullisharXiv – CS AI · 4d ago6/10
🧠Researchers propose Skill-Conditioned Gated Self-Distillation (SGSD), a novel method for improving large language model reasoning by leveraging an experience-derived skill bank rather than trusted reference answers. The approach validates skills through a multi-teacher framework and demonstrates consistent improvements over existing methods on mathematical reasoning benchmarks.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce Vision-OPD, a self-distillation framework that improves multimodal large language models' ability to detect fine-grained visual details by training full-image models to match the performance of crop-focused models. The technique achieves competitive results against larger models without requiring external teachers, labels, or inference-time tools, addressing a critical weakness in current MLLMs.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce SHRED, a machine unlearning method for large language models that removes memorized private or copyrighted data without requiring a curated retain set of examples. By selectively demoting logits of high-information tokens while preserving model utility through self-distillation, SHRED achieves superior trade-offs between forgetting efficacy and performance compared to existing retain-set-dependent approaches.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose Multilingual Self-Distillation (MSD), a framework that transfers safety safeguards from high-resource languages like English to vulnerable low-resource languages in large language models. The method eliminates the need for expensive multilingual response data by leveraging an LLM's existing safety capabilities, demonstrating effective cross-lingual protection across diverse jailbreak benchmarks.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers demonstrate that On-Policy Self-Distillation (OPSD) functions primarily as a compression mechanism rather than a correction tool for thinking-enabled mathematical reasoning models. They propose a revised training pipeline (SFT → RLVR → OPSD) that leverages OPSD's strengths in shortening responses while preserving accuracy on correct outputs.
AIBullisharXiv – CS AI · May 96/10
🧠Researchers introduce UniSD, a unified self-distillation framework that systematically improves large language model adaptation without requiring external teacher models. The framework combines multiple complementary mechanisms and demonstrates consistent performance gains of +5.4 points over baseline models across six benchmarks, advancing efficient LLM training techniques.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers introduce Self-Distillation Fine-Tuning (SDFT), a framework that recovers performance degradation in Large Language Models caused by compression, quantization, and catastrophic forgetting. Using Centered Kernel Alignment analysis, the study demonstrates that self-distillation works by aligning the student model's high-dimensional manifold with the teacher model's optimal representation structure.
AIBullisharXiv – CS AI · Apr 146/10
🧠Researchers introduce Skill-SD, a novel training framework for multi-turn LLM agents that improves sample efficiency by converting successful agent trajectories into dynamic natural language skills that condition a teacher model. The approach combines reinforcement learning with self-distillation and achieves significant performance improvements over baseline methods on benchmark tasks.
AINeutralarXiv – CS AI · Mar 266/10
🧠Researchers introduce SPARE, a new machine unlearning method for text-to-image diffusion models that efficiently removes unwanted concepts while preserving model performance. The two-stage approach uses parameter localization and self-distillation to achieve selective concept erasure with minimal computational overhead.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce Truncated-Reasoning Self-Distillation (TRSD), a post-training method that enables AI language models to maintain accuracy while using shorter reasoning traces. The technique reduces computational costs by training models to produce correct answers from partial reasoning, achieving significant inference-time efficiency gains without sacrificing performance.
AIBullisharXiv – CS AI · Mar 37/106
🧠Researchers propose Attention Smoothing Unlearning (ASU), a new framework that helps Large Language Models forget sensitive or copyrighted content without losing overall performance. The method uses self-distillation and attention smoothing to erase specific knowledge while maintaining coherent responses, outperforming existing unlearning techniques.
AINeutralarXiv – CS AI · Mar 35/104
🧠Researchers developed UTICA, a new foundation model for time series classification that uses non-contrastive self-distillation methods adapted from computer vision. The model achieves state-of-the-art performance on UCR and UEA benchmarks by learning temporal patterns through a student-teacher framework with data augmentation and patch masking.