A Projection-Based Surrogate Gradient Interpretation for Neural Codec Wrappers
Researchers propose a new interpretation of surrogate gradients for training neural codec wrappers, showing that the SCALED method can be understood as a first-order approximation of video codecs. The technique enables end-to-end learning of pre- and post-processing networks alongside conventional codecs, achieving significant compression improvements of up to 23.59% BD-Rate reduction on x264.
This research addresses a fundamental challenge in neural codec optimization: the inability to directly train neural networks alongside non-differentiable video codecs due to their discrete encoding operations. The surrogate gradient approach, particularly the SCALED method, circumvents this limitation by providing learnable gradient signals without requiring separate network mimicry. The authors' theoretical contribution—reinterpreting SCALED as a local approximation rather than merely a reparameterization trick—deepens understanding of why this method works effectively.
The practical implications extend beyond downscaling operations to full neural wrapping architectures combining both pre- and post-processing networks. Achieving BD-Rate improvements up to 23.59% on x264 and 20.07% on VVenC demonstrates the method's real-world efficacy across different codec standards and quality settings. This generalization capability is particularly valuable as different applications and regions employ varying video codecs.
For the broader AI and machine learning community, this represents progress in handling non-differentiable operations—a problem extending beyond video compression to robotics, discrete optimization, and quantized neural networks. The theoretical grounding provided by the projection-based interpretation enables more principled development of surrogate gradient methods across domains.
Future work likely includes exploring whether similar projection-based interpretations apply to other surrogate gradient techniques, potential integration with hardware-accelerated codec implementations, and applications to emerging codec standards like AV1 and VP9.
- →Surrogate gradients enable end-to-end neural network training with non-differentiable video codecs through first-order local approximations.
- →The SCALED method achieves up to 23.59% BD-Rate improvement on x264 and 20.07% on VVenC through learned pre- and post-processing.
- →The projection-based interpretation provides theoretical foundation for understanding why surrogate gradients work, improving interpretability and reproducibility.
- →The approach generalizes across different video codecs, quality factors, and multiple downscaling ratios without codec-specific retraining.
- →This technique has potential applications beyond video compression to other domains involving non-differentiable discrete operations.