AINeutralarXiv – CS AI · 10h ago6/10
🧠
On the Expressive Power of Weight Quantization in Large Language Models
Researchers establish theoretical limits on weight quantization in large language models, identifying 1.58-bit as the minimum precision threshold before expressive collapse occurs. The study demonstrates that model performance degrades polynomially as quantization bits decrease, providing theoretical foundations for optimizing model compression and inference acceleration techniques.