AIBearisharXiv – CS AI · 3h ago6/10
🧠
When NPUs Are Not Always Faster: A Stage-Level Analysis of Mobile LLM Inference
A research study reveals that NPUs (Neural Processing Units) on mobile devices don't consistently accelerate LLM inference as expected, with CPUs outperforming NPUs on compute-intensive prefill operations and NPUs providing only marginal speedups on memory-bound decode stages. The findings challenge assumptions about heterogeneous mobile computing and suggest current NPU designs require architectural improvements for on-device AI workloads.