AIBullisharXiv โ CS AI ยท 6d ago6/104
๐ง
Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats
Researchers evaluated HiFloat (HiF8 and HiF4) formats for low-bit inference on Ascend NPUs, finding them superior to integer formats for high-variance data and preventing accuracy collapse in 4-bit regimes. The study demonstrates HiFloat's compatibility with existing quantization frameworks and its potential for efficient large language model inference.