y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#kernel-fusion News & Analysis

3 articles tagged with #kernel-fusion. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AIBullisharXiv – CS AI · Apr 77/10
🧠

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Researchers have developed a new low-bit mixed-precision attention kernel called Diagonal-Tiled Mixed-Precision Attention (DMA) that significantly speeds up large language model inference on NVIDIA B200 GPUs while maintaining generation quality. The technique uses microscaling floating-point (MXFP) data format and kernel fusion to address the high computational costs of transformer-based models.

🏢 Nvidia
AINeutralHugging Face Blog · Jun 116/10
🧠

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

This article demonstrates PyTorch profiling techniques for optimizing neural network performance, specifically comparing standard nn.Linear layers with fused MLP implementations. The work illustrates how developer-level optimization practices can significantly improve AI model efficiency, relevant to both open-source ML communities and production deployment scenarios.