AIBullisharXiv โ CS AI ยท 5d ago7/104
๐ง
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.