🧠 AI🟢 BullishImportance 7/10

ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits

arXiv – CS AI|Aryan Karmore|March 3, 2026 at 05:00 AM|2 views

🤖AI Summary

ButterflyMoE introduces a breakthrough approach to reduce memory requirements for AI expert models by 150× through geometric parameterization instead of storing independent weight matrices. The method uses shared ternary prototypes with learned rotations to achieve sub-linear memory scaling, enabling deployment of multiple experts on edge devices.

Key Takeaways

→ButterflyMoE reduces memory requirements from O(N·d²) to O(d² + N·d log d), achieving sub-linear scaling in expert numbers.
→The method achieves 150× memory reduction with 256 experts while maintaining negligible accuracy loss.
→Experts are treated as geometric reorientations of shared quantized substrates rather than independent matrices.
→The approach enables multiple AI experts to run on edge-constrained devices previously impossible due to memory limitations.
→Learned rotations with quantization help reduce activation outliers and stabilize extreme low-bit training scenarios.