y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#post-training-quantization News & Analysis

4 articles tagged with #post-training-quantization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv โ€“ CS AI ยท 6d ago7/10
๐Ÿง 

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

Researchers introduce MoBiE, a novel binarization framework designed specifically for Mixture-of-Experts large language models that achieves significant efficiency gains through weight compression while maintaining model performance. The method addresses unique challenges in quantizing MoE architectures and demonstrates over 2ร— inference speedup with substantial perplexity reductions on benchmark models.

๐Ÿข Perplexity
AINeutralarXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos

Researchers provide the first rigorous theoretical analysis of OPTQ (GPTQ), a widely-used post-training quantization algorithm for neural networks and LLMs, establishing quantitative error bounds and validating practical design choices. The study extends theoretical guarantees to both deterministic and stochastic variants of OPTQ and the Qronos algorithm, offering guidance for regularization parameter selection and quantization alphabet sizing.

AINeutralarXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Researchers conducted the first systematic study on post-training quantization for diffusion large language models (dLLMs), identifying activation outliers as a key challenge for compression. The study evaluated state-of-the-art quantization methods across multiple dimensions to provide insights for efficient dLLM deployment on edge devices.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1017
๐Ÿง 

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization

Researchers introduce Quant Experts (QE), a new post-training quantization technique for Vision-Language Models that uses adaptive error compensation with mixture-of-experts architecture. The method addresses computational and memory overhead issues by intelligently handling token-dependent and token-independent channels, maintaining performance comparable to full-precision models across 2B to 70B parameter scales.