AIBullisharXiv – CS AI · 7h ago7/10
🧠
TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs
TileFuse is a new kernel library that enables efficient quantized large language model inference on AMD's XDNA2 NPUs by supporting industry-standard quantization formats like AWQ directly, rather than requiring model reshaping. The technology delivers up to 2x improvements in latency and energy efficiency on edge devices, making practical LLM deployment on consumer hardware substantially more viable.