y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 7/10

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

arXiv – CS AI|Ke Li, Zheng Yang, Zhongbin Zhou, Feng Xue, Zhonglin Jiang, Wenxiao Wang||4 views
🤖AI Summary

Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.

Key Takeaways
  • HEAPr enables more granular pruning of MoE models by breaking down experts into smaller atomic components rather than pruning entire experts.
  • The algorithm reduces space complexity from O(d^4) to O(d^2) by transforming second-order information calculations.
  • Testing on DeepSeek MoE and Qwen MoE models shows superior performance compared to existing expert-level pruning methods.
  • The method requires only two forward passes and one backward pass on a calibration set to compute atomic expert importance.
  • HEAPr achieves nearly lossless compression at 20-25% pruning ratios while reducing FLOPs by approximately 20%.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles