y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#mixed-precision News & Analysis

4 articles tagged with #mixed-precision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees

Researchers propose an expert-wise mixed-precision quantization strategy for Mixture-of-Experts models that assigns bit-widths based on router gradient changes and neuron variance. The method achieves higher accuracy than existing approaches while reducing inference memory overhead on large-scale models like Switch Transformer and Mixtral with minimal computational overhead.

AINeutralarXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs

Researchers reproduced and analyzed severe accuracy degradation in BERT transformer models when applying post-training quantization, showing validation accuracy drops from 89.66% to 54.33%. The study found that structured activation outliers intensify with model depth, with mixed precision quantization being the most effective mitigation strategy.

AIBullisharXiv โ€“ CS AI ยท Feb 277/106
๐Ÿง 

Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.

AINeutralMarkTechPost ยท Apr 64/10
๐Ÿง 

An Implementation Guide to Running NVIDIA Transformer Engine with Mixed Precision, FP8 Checks, Benchmarking, and Fallback Execution

A technical tutorial demonstrates implementing NVIDIA's Transformer Engine with mixed-precision acceleration, covering GPU setup, CUDA compatibility verification, and fallback execution handling. The guide focuses on practical deep learning workflow optimization using FP8 precision and benchmarking techniques.

๐Ÿข Nvidia