Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU
Xe-Forge is an LLM-powered system that automates kernel optimization for Intel GPUs, eliminating repetitive manual porting work that typically gates algorithm deployment on new accelerators. Testing on 97 kernels achieved 1.17x geometric mean speedup with 67% of kernels improving and some exceeding 5x gains, demonstrating that structured domain knowledge combined with hardware-in-the-loop verification can systematically accelerate hardware adoption.
