#neural-network-compression News & Analysis

7 articles tagged with #neural-network-compression. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles

AIBullisharXiv – CS AI · Jun 87/10

🧠

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

Researchers propose Neuron-Level Mixed-Precision Quantization Aware Training (NMP-QAT), a neural network compression technique that independently optimizes precision for individual neurons rather than entire layers. The method achieves better compression-accuracy trade-offs than existing approaches, making it particularly valuable for deploying AI models on resource-constrained edge devices in 6G networks.

AIBullisharXiv – CS AI · May 277/10

🧠

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

Researchers introduce InfoQuant, a training-free method that optimizes activation distributions for low-bit quantization in large language models by using Peak Suppression Orthogonal Transformation. The technique achieves 97% accuracy preservation under W4A4KV4 quantization and reduces performance degradation by 42% compared to previous methods, advancing efficient LLM deployment.

AIBullisharXiv – CS AI · Jun 256/10

🧠

Hierarchical Reinforcement Learning for Neural Network Compression (HiReLC): Pruning and Quantization

Researchers introduce HiReLC, a hierarchical reinforcement learning framework that automates the joint compression of neural networks through pruning and quantization. The system achieves 5.99-6.72x compression ratios across Vision Transformers and CNNs with minimal accuracy loss, using a two-level agent architecture guided by Fisher Information sensitivity estimates.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for Exponential Compression of Deep Neural Networks

Researchers introduce Automatically Differentiable Nonlinear Tensor Networks (ADNTNs), a novel technique for compressing deep neural networks by building large weight tensors from hierarchical small cores with nonlinear activations. The method achieves compression ratios from 2,000× to 77,000× on standard architectures like AlexNet and VGG-16 while maintaining or improving accuracy, representing a mathematically structured approach to reducing model size.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Neural Network Compression by Approximate Differential Equivalence

Researchers propose a novel neural network compression method using polynomial ODE systems and Approximate Forward Differential Equivalence to aggregate neurons with similar functional behavior, rather than pruning weights independently. The approach achieves significant parameter reduction while maintaining accuracy, outperforming traditional magnitude-based pruning methods across synthetic and public benchmarks.

AINeutralarXiv – CS AI · May 16/10

🧠

Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression

Researchers introduce Vanishing Contributions (VCON), a unified framework for compressing deep neural networks through gradual parallel execution of original and compressed models. The technique demonstrates 1-15% accuracy improvements across vision and NLP tasks compared to existing compression methods.

AIBullisharXiv – CS AI · Apr 156/10

🧠

CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models

Researchers introduce CLASP, a token reduction framework that optimizes Multimodal Large Language Models by intelligently pruning visual tokens through class-adaptive layer fusion and dual-stage pruning. The approach addresses computational inefficiency in MLLMs while maintaining performance across diverse benchmarks and architectures.