y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#knowledge-distillation News & Analysis

49 articles tagged with #knowledge-distillation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

49 articles
AINeutralarXiv – CS AI · May 116/10
🧠

AsymTalker: Identity-Consistent Long-Term Talking Head Generation via Asymmetric Distillation

AsymTalker introduces a diffusion-based method for generating long-form talking head videos with consistent identity and synchronized audio. The approach solves critical challenges in extended video synthesis through temporal reference encoding and asymmetric knowledge distillation, achieving real-time performance at 66 FPS on videos up to 10 minutes long.

AINeutralarXiv – CS AI · May 96/10
🧠

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Skill1 presents a unified reinforcement learning framework that enables language model agents to co-evolve three coupled capabilities: skill selection, utilization, and distillation from a single task-outcome reward signal. Demonstrated improvements over existing baselines on complex tasks suggest advances in how AI agents can build and leverage persistent skill libraries across diverse problem domains.

AINeutralarXiv – CS AI · May 96/10
🧠

Multi-Modality Distillation via Learning the teacher's modality-level Gram Matrix

Researchers propose a novel knowledge distillation method for multi-modal AI systems that transfers modality relationship information from teacher to student networks by learning the teacher's Gram Matrix. This approach goes beyond existing methods that only focus on final output, enabling deeper knowledge transfer across different data modalities.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

Researchers propose AdaRankLLM, an adaptive retrieval-augmented generation framework that dynamically filters irrelevant passages to reduce computational overhead while maintaining output quality. The study challenges whether adaptive retrieval remains necessary as language models grow more robust, finding that its value differs significantly between weaker and stronger models.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting

Researchers introduce Self-Distillation Fine-Tuning (SDFT), a framework that recovers performance degradation in Large Language Models caused by compression, quantization, and catastrophic forgetting. Using Centered Kernel Alignment analysis, the study demonstrates that self-distillation works by aligning the student model's high-dimensional manifold with the teacher model's optimal representation structure.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Researchers propose trace rewriting techniques to protect language models from unauthorized knowledge distillation, a process where smaller models learn from larger ones without permission. The methods preserve model accuracy while degrading distillation usefulness and embedding detectable watermarks in student models.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Teaching the Teacher: The Role of Teacher-Student Smoothness Alignment in Genetic Programming-based Symbolic Distillation

Researchers propose a novel framework for improving symbolic distillation of neural networks by regularizing teacher models for functional smoothness using Jacobian and Lipschitz penalties. This approach addresses the core challenge that standard neural networks learn complex, irregular functions while symbolic regression models prioritize simplicity, resulting in poor knowledge transfer. Results across 20 datasets demonstrate statistically significant improvements in predictive accuracy for distilled symbolic models.

AIBullisharXiv – CS AI · Apr 136/10
🧠

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

Researchers introduce WAND, a framework that reduces computational and memory costs of autoregressive text-to-speech models by replacing full self-attention with windowed attention combined with knowledge distillation. The approach achieves up to 66.2% KV cache memory reduction while maintaining speech quality, addressing a critical scalability bottleneck in modern AR-TTS systems.

AIBullishDecrypt – AI · Apr 126/10
🧠

Want Claude Opus AI on Your Potato PC? This Is Your Next-Best Bet

A developer has created Qwopus, a distilled version of Claude Opus 4.6's reasoning capabilities embedded into a local Qwen model that runs on consumer hardware. The tool democratizes access to advanced AI reasoning by enabling users with modest computing resources to run sophisticated models locally, challenging the centralized AI infrastructure paradigm.

Want Claude Opus AI on Your Potato PC? This Is Your Next-Best Bet
🧠 Claude🧠 Opus
AIBullisharXiv – CS AI · Mar 176/10
🧠

Knowledge Distillation for Large Language Models

Researchers developed a resource-efficient framework for compressing large language models using knowledge distillation and chain-of-thought reinforcement learning. The method successfully compressed Qwen 3B to 0.5B while retaining 70-95% of performance across English, Spanish, and coding tasks, making AI models more suitable for resource-constrained deployments.

AIBullisharXiv – CS AI · Mar 166/10
🧠

Task-Specific Knowledge Distillation via Intermediate Probes

Researchers introduce a new knowledge distillation framework that improves training of smaller AI models by using intermediate representations from large language models rather than their final outputs. The method shows consistent improvements across reasoning benchmarks, particularly when training data is limited, by providing cleaner supervision signals.

AINeutralarXiv – CS AI · Mar 126/10
🧠

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Researchers propose Contract And Conquer (CAC), a new method for provably generating adversarial examples against black-box neural networks using knowledge distillation and search space contraction. The approach provides theoretical guarantees for finding adversarial examples within a fixed number of iterations and outperforms existing methods on ImageNet datasets including vision transformers.

AIBullisharXiv – CS AI · Mar 37/105
🧠

KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models

Researchers have developed KDFlow, a new framework for compressing large language models that achieves 1.44x to 6.36x faster training speeds compared to existing knowledge distillation methods. The framework uses a decoupled architecture that optimizes both training and inference efficiency while reducing communication costs through innovative data transfer techniques.

AIBullisharXiv – CS AI · Mar 36/104
🧠

Distillation of Large Language Models via Concrete Score Matching

Researchers propose Concrete Score Distillation (CSD), a new knowledge distillation method that improves efficiency of large language models by better preserving logit information compared to traditional softmax-based approaches. CSD demonstrates consistent performance improvements across multiple models including GPT-2, OpenLLaMA, and GEMMA while maintaining training stability.

AIBullisharXiv – CS AI · Feb 275/107
🧠

Decoder-based Sense Knowledge Distillation

Researchers have developed Decoder-based Sense Knowledge Distillation (DSKD), a new framework that integrates lexical resources into decoder-style large language models during training. The method enhances knowledge distillation performance while enabling generative models to inherit structured semantics without requiring dictionary lookup during inference.

AIBullisharXiv – CS AI · Feb 276/106
🧠

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Researchers propose RL-aware distillation (RLAD), a new method to efficiently transfer knowledge from large language models to smaller ones during reinforcement learning training. The approach uses Trust Region Ratio Distillation (TRRD) to selectively guide student models only when it improves policy updates, outperforming existing distillation methods across reasoning benchmarks.

AIBullisharXiv – CS AI · Feb 276/107
🧠

Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL

Researchers propose Struct-SQL, a knowledge distillation framework that improves Small Language Models for Text-to-SQL tasks by using structured Chain-of-Thought reasoning instead of unstructured approaches. The method achieves an 8.1% improvement over baseline distillation, primarily by reducing syntactic errors through formal query execution plan blueprints.

AIBullishHugging Face Blog · Aug 16/106
🧠

Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny

Stability AI has open-sourced knowledge distillation code and model weights for SD-Small and SD-Tiny, making smaller and more efficient versions of Stable Diffusion available to the community. This release enables developers to run image generation models with reduced computational requirements while maintaining reasonable quality.

AINeutralarXiv – CS AI · Apr 64/10
🧠

Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization

Researchers developed EWAD and CPDP techniques for improving multi-teacher knowledge distillation in low-resource abstractive summarization tasks. The study across Bangla and cross-lingual datasets shows logit-level knowledge distillation provides most reliable gains, while complex distillation improves short summaries but degrades longer outputs.

AIBullisharXiv – CS AI · Mar 275/10
🧠

Neural Network Conversion of Machine Learning Pipelines

Researchers developed a method to transfer knowledge from traditional machine learning pipelines to neural networks, specifically converting random forest classifiers into student neural networks. Testing on 100 OpenML tasks showed that neural networks can successfully mimic random forest performance when proper hyperparameters are selected.

AINeutralarXiv – CS AI · Mar 264/10
🧠

Powerful Teachers Matter: Text-Guided Multi-view Knowledge Distillation with Visual Prior Enhancement

Researchers propose Text-guided Multi-view Knowledge Distillation (TMKD), a new method that uses dual-modality teachers (visual and text) to improve knowledge transfer from large AI models to smaller ones. The approach enhances visual teachers with multi-view inputs and incorporates CLIP text guidance, achieving up to 4.49% performance improvements across five benchmarks.

AINeutralarXiv – CS AI · Mar 164/10
🧠

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

Researchers introduce Steve-Evolving, a new AI framework for open-world embodied agents that uses fine-grained diagnosis and knowledge distillation to improve long-horizon task performance. The system organizes interaction experiences into structured tuples and continuously evolves without model parameter updates, showing improvements in Minecraft testing environments.

AINeutralarXiv – CS AI · Mar 34/103
🧠

Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems

Researchers propose Rejuvenated Cross-Entropy for Knowledge Distillation (RCE-KD) to improve knowledge distillation in recommender systems by addressing limitations of Cross-Entropy loss when distilling teacher model rankings. The method splits teacher's top items into subsets and uses adaptive sampling to better align with theoretical assumptions.

← PrevPage 2 of 2