Neural Network Compression by Approximate Differential Equivalence
Researchers propose a novel neural network compression method using polynomial ODE systems and Approximate Forward Differential Equivalence to aggregate neurons with similar functional behavior, rather than pruning weights independently. The approach achieves significant parameter reduction while maintaining accuracy, outperforming traditional magnitude-based pruning methods across synthetic and public benchmarks.
This research introduces a fundamentally different perspective on neural network compression by treating models as dynamical systems rather than static weight matrices. Instead of removing individual parameters based on importance scores, the method identifies neurons exhibiting similar computational dynamics and merges them, creating a more holistic compression strategy. This shift from weight-centric to function-centric pruning represents a meaningful theoretical advance in model optimization.
The approach addresses a critical bottleneck in deploying large language models and neural networks at scale. As AI systems grow increasingly complex, compression becomes essential for edge deployment, reducing inference costs, and enabling real-time applications. Existing magnitude-based pruning methods operate locally on individual weights without considering how neurons collectively contribute to model behavior, potentially missing optimization opportunities.
The use of polynomial ODE encoding provides mathematical rigor and interpretability—a single tolerance parameter controls the compression-accuracy trade-off cleanly. Testing on dynamical system benchmarks demonstrates the method's theoretical soundness, while public regression benchmarks validate practical utility. Consistent outperformance against Wanda and magnitude-based pruning suggests this differential equivalence framework captures important structural properties missed by weight-magnitude heuristics.
For practitioners, this work offers an alternative tool in the compression toolkit, particularly valuable for models where functional redundancy matters more than individual weight importance. The approach may prove especially effective for scientific computing and physics-informed neural networks where underlying dynamics are known. Future research should explore scaling this method to transformers and large language models, which would have substantial commercial implications for inference optimization.
- →Differential equivalence-based compression aggregates functionally similar neurons rather than pruning weights independently.
- →Method encodes trained networks as polynomial ODE systems to identify neurons with approximately matching induced dynamics.
- →Single tolerance parameter enables smooth, interpretable trade-offs between model size and predictive accuracy.
- →Approach outperforms magnitude-based pruning and Wanda on both synthetic dynamical systems and public benchmarks.
- →Framework provides theoretical rigor and potential advantages for scientific computing and physics-informed applications.