#kernel-fusion News & Analysis

3 articles tagged with #kernel-fusion. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · Apr 77/10

🧠

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Researchers have developed a new low-bit mixed-precision attention kernel called Diagonal-Tiled Mixed-Precision Attention (DMA) that significantly speeds up large language model inference on NVIDIA B200 GPUs while maintaining generation quality. The technique uses microscaling floating-point (MXFP) data format and kernel fusion to address the high computational costs of transformer-based models.

🏢 Nvidia

AIBullisharXiv – CS AI · Mar 127/10

🧠

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

RedFuser is a new automated framework that optimizes AI model deployment by fusing cascaded reduction operations into single loops, achieving 2-5x performance improvements. The system addresses limitations in existing AI compilers that struggle with complex multi-loop operations like those found in attention mechanisms.

AINeutralHugging Face Blog · Jun 116/10

🧠

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

This article demonstrates PyTorch profiling techniques for optimizing neural network performance, specifically comparing standard nn.Linear layers with fused MLP implementations. The work illustrates how developer-level optimization practices can significantly improve AI model efficiency, relevant to both open-source ML communities and production deployment scenarios.