DASH: Dual-Branch Score Distillation for Guidance-Calibrated Compact Diffusion Models
DASH introduces a dual-branch distillation framework for compressing class-conditional diffusion models while preserving classifier-free guidance effectiveness. By independently supervising both conditional and unconditional score branches, the method achieves 5.9x model compression with minimal quality degradation, addressing a critical limitation in existing distillation approaches where guidance mechanisms collapse during compression.
DASH addresses a fundamental challenge in neural network compression: preserving model functionality beyond raw output accuracy. Traditional distillation methods focus on matching teacher-student outputs at the final layer, but this approach fails for diffusion models that rely on dual-branch architectures where separate conditional and unconditional pathways must maintain distinct roles. The research identifies how unconditional score branches become unsupervised during compression, causing both branches to converge toward identical predictions and rendering classifier-free guidance ineffective despite acceptable loss metrics.
This work emerges from broader efforts to make generative models more computationally efficient as diffusion-based systems gain adoption in production environments. Smaller, faster models enable deployment on edge devices and reduce inference costs, critical factors for commercial viability. The introduction of TIRT Transfer—freezing the teacher's importance curriculum into the student—represents an elegant solution to knowledge transfer that respects the limited training budgets of compression workflows.
The technical contribution matters for AI infrastructure development and resource-constrained deployment scenarios. Organizations building real-time image generation systems, mobile applications, or latency-sensitive services benefit directly from maintaining guidance quality while reducing computational overhead. The 5.9x compression ratio while staying within 4 FID points of the original model demonstrates practical viability rather than theoretical interest.
Future research should explore whether these dual-branch constraints transfer to other architecture families and whether similar patterns exist in other conditional generation tasks beyond image synthesis. The emphasis on branch-specific supervision may reshape how practitioners approach distillation for multi-component neural systems.
- →DASH preserves classifier-free guidance effectiveness during diffusion model compression through independent supervision of conditional and unconditional branches.
- →Achieves 5.9x parameter compression with minimal quality loss (4 FID point gap), addressing practical deployment constraints.
- →Identifies unconditional score branch supervision as the dominant contribution, accounting for 60% of total distillation gains.
- →TIRT Transfer eliminates curriculum relearning by freezing teacher importance weights, reducing training overhead.
- →Demonstrates that output-level distillation metrics alone inadequately measure guidance preservation in multi-branch architectures.