AIBullisharXiv – CS AI · 6h ago6/10
🧠
EPTS: Elastic Post-Training Sparsity for Efficient Large Language Model Compression
Researchers introduce EPTS, a new framework for compressing large language models that enables a single optimized model to perform efficiently across multiple sparsity levels, eliminating the need for separate optimization for each deployment scenario. This approach combines Multi-Sparsity Hierarchy LoRA and a Feature Mixer mechanism to maintain performance while reducing computational requirements.