🧠 AI🟢 BullishImportance 7/10

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

arXiv – CS AI|Ke Li, Zheng Yang, Zhongbin Zhou, Feng Xue, Zhonglin Jiang, Wenxiao Wang|March 3, 2026 at 05:00 AM|4 views

🤖AI Summary

Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.

Key Takeaways

→HEAPr enables more granular pruning of MoE models by breaking down experts into smaller atomic components rather than pruning entire experts.
→The algorithm reduces space complexity from O(d^4) to O(d^2) by transforming second-order information calculations.
→Testing on DeepSeek MoE and Qwen MoE models shows superior performance compared to existing expert-level pruning methods.
→The method requires only two forward passes and one backward pass on a calibration set to compute atomic expert importance.
→HEAPr achieves nearly lossless compression at 20-25% pruning ratios while reducing FLOPs by approximately 20%.