AIBullisharXiv – CS AI · Mar 37/104
🧠
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.