GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization
Researchers demonstrate that large language models can effectively forecast GPU kernel performance, reducing expensive on-device evaluations during optimization searches. By acting as selective surrogates that know their confidence limits, LLMs enable kernel searches to evaluate multiple candidates under fixed GPU budgets, ultimately discovering faster kernels than baseline approaches.