y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#low-bit-computation News & Analysis

1 article tagged with #low-bit-computation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · Apr 77/10
🧠

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Researchers have developed a new low-bit mixed-precision attention kernel called Diagonal-Tiled Mixed-Precision Attention (DMA) that significantly speeds up large language model inference on NVIDIA B200 GPUs while maintaining generation quality. The technique uses microscaling floating-point (MXFP) data format and kernel fusion to address the high computational costs of transformer-based models.

🏢 Nvidia