AIBullisharXiv – CS AI · 9h ago7/10
🧠
Model-Preserving Adaptive Rounding
Researchers introduce YAQA, a new quantization algorithm that improves model compression by directly optimizing end-to-end error rather than layer-by-layer error. The method achieves 30% error reduction compared to existing approaches like GPTQ and even outperforms quantization-aware training, with theoretical guarantees backing its performance.