y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#int8 News & Analysis

3 articles tagged with #int8. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AIBullisharXiv – CS AI · May 277/10
🧠

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Researchers conducted an extensive empirical study evaluating FP8, INT8, and INT4 quantization formats across the Llama-3.1 model family, finding that FP8 is effectively lossless while INT4 weight-only quantization performs surprisingly well. The findings provide practical deployment guidelines for optimizing the accuracy-performance trade-off in large language model inference at scale.

🧠 Llama
AIBullisharXiv – CS AI · Mar 37/103
🧠

SageBwd: A Trainable Low-bit Attention

Researchers have developed SageBwd, a trainable INT8 attention mechanism that can match full-precision attention performance during pre-training while quantizing six of seven attention matrix multiplications. The study identifies key factors for stable training including QK-norm requirements and the impact of tokens per step on quantization errors.

AIBullisharXiv – CS AI · Apr 76/10
🧠

MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

Researchers propose MUXQ, a new quantization technique for large language models that addresses activation outliers through low-rank decomposition. The method enables efficient INT8 quantization while maintaining accuracy close to FP16, making it suitable for edge device deployment with NPU-based hardware.

🏢 Perplexity