y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#mixed-precision News & Analysis

7 articles tagged with #mixed-precision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AIBullisharXiv – CS AI · May 77/10
🧠

EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation

EdgeRazor introduces a lightweight quantization framework that compresses large language models to 1.88-bit precision while maintaining performance superior to existing 3-bit methods. The approach combines mixed-precision quantization with knowledge distillation and achieves up to 15.1× faster decoding with 80% storage reduction, requiring significantly lower computational training budgets than comparable techniques.

AIBullisharXiv – CS AI · Apr 107/10
🧠

Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees

Researchers propose an expert-wise mixed-precision quantization strategy for Mixture-of-Experts models that assigns bit-widths based on router gradient changes and neuron variance. The method achieves higher accuracy than existing approaches while reducing inference memory overhead on large-scale models like Switch Transformer and Mixtral with minimal computational overhead.

AIBullisharXiv – CS AI · Feb 277/106
🧠

Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.

AIBullisharXiv – CS AI · May 16/10
🧠

Mixed Precision Training of Neural ODEs

Researchers present a mixed precision training framework for neural ODEs that reduces memory usage by ~50% and achieves up to 2x speedup while maintaining accuracy. The approach uses low-precision computations for velocity evaluations and intermediate states while preserving high precision for weights and gradient accumulation, addressing computational and memory bottlenecks in continuous-time neural network architectures.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Training Time Prediction for Mixed Precision-based Distributed Training

Researchers have developed a precision-aware training time predictor for distributed deep learning that accounts for floating-point precision settings, achieving 9.8% prediction accuracy compared to 147.85% error in existing models that ignore precision variations. The work addresses a critical gap in resource allocation and cost estimation for AI training workloads, where precision choices can create 2.4x variations in training time.