y0news
← Feed
Back to feed
🧠 AI Neutral

Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs

arXiv – CS AI|Pranav Kumar Kaliaperumal|
🤖AI Summary

Researchers reproduced and analyzed severe accuracy degradation in BERT transformer models when applying post-training quantization, showing validation accuracy drops from 89.66% to 54.33%. The study found that structured activation outliers intensify with model depth, with mixed precision quantization being the most effective mitigation strategy.

Key Takeaways
  • Global W8A8 quantization causes dramatic 35.33 point accuracy drop in BERT-base models on QNLI tasks.
  • Activation outliers show heavy-tailed behavior that intensifies with model depth, with 55% of energy concentrated in top 1% of channels.
  • Mixed precision post-training quantization successfully restores accuracy to near FP32 baseline levels at 89.42%.
  • Per-embedding-group quantization shows strong sensitivity to grouping structure, varying from 66.12% to 86.18% accuracy.
  • Hardware deployment shows minimal latency differences across methods, emphasizing need for hardware-aware evaluation.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles