AIBullisharXiv – CS AI · 14h ago7/10
🧠
Pushing the Limits of Block Rotations in Post-Training Quantization
Researchers present PeRQ, a post-training quantization method that uses permutations to optimize block rotations for neural network compression. The approach recovers up to 90% of full-vector rotation performance when quantizing large language models to INT4, significantly outperforming existing block rotation methods.
🏢 Perplexity🧠 Llama