y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#int4-quantization News & Analysis

1 article tagged with #int4-quantization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 6h ago7/10
🧠

When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon

Researchers demonstrate that int4 quantization of KV caches on Apple Silicon's unified memory architecture actually improves performance over fp16, delivering 3-8% faster inference while reducing memory usage by 3x. This inverts the traditional quality-latency tradeoff through a fused Metal kernel combining sign-randomized FFT, per-channel scaling, and int4 packing, with applications from 1B to 1.5B parameter models.

🏢 Hugging Face