AINeutralarXiv – CS AI · 10h ago6/10
🧠
An Empirical Study of OpenPangu Quantization on Ascend NPUs
Researchers conducted a systematic empirical study evaluating quantization methods for OpenPangu language models on Huawei Ascend NPUs, finding that 8-bit weight-only quantization is lossless while 4-bit quantization remains practical for larger models but degrades performance on reasoning tasks in smaller models. The study reveals that extreme low-bit compression (2-bit and binary) remains fundamentally challenging, with most configurations collapsing to near-random behavior.
🏢 Perplexity