Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages
Researchers introduce KLineage, a system that teaches LLM-based agents when to apply GPU kernel optimizations by learning from expert implementations through backward validation rather than forward trial-and-error. The approach extracts reusable optimization skills that encode not just what optimizations work, but the conditions and contexts where they're valid, demonstrating improved kernel quality over existing memory-based baselines.
KLineage addresses a fundamental gap in AI-assisted GPU kernel development: knowing which optimizations to apply in which situations. While large language models can generate code and understand various optimization techniques, they frequently lack the contextual knowledge to determine when those techniques are actually sound. This research tackles that problem by reversing the learning process—instead of having models learn through forward rollouts and trial-and-error, KLineage extracts optimization skills from expert kernels by walking backward through validated simplifications. Each extracted skill becomes a reusable component that captures not only the optimization intent but also its applicability conditions, side effects, and failure modes. The broader context reflects growing recognition that AI code generation requires more than pattern matching; it demands understanding of domain-specific constraints and safety boundaries. GPU kernel optimization is particularly critical because performance gains directly translate to reduced computational costs and energy consumption across data centers. The validation-gated approach introduces compile-time correctness checks and performance profiling, reducing the risk of deploying faulty optimizations. Testing across five expert workloads and two NVIDIA architectures demonstrates practical applicability. The held-out test set helps verify the system learned generalizable skills rather than memorizing specific cases. This advancement matters for developers building AI infrastructure, as improved GPU kernel optimization reduces deployment costs and latency—direct economic benefits in a competitive AI development landscape. The methodology could extend beyond kernels to other specialized code optimization domains.
- →KLineage extracts optimization skills from expert GPU kernels using backward validation rather than forward trial-and-error learning
- →Each learned skill encodes when optimizations apply, what conditions validate them, and what assumptions they rely on
- →The system exceeded memory-based LLM baselines in both kernel quality and optimization efficiency on tested workloads
- →Validation gates ensure compile correctness and performance profiling before applying optimizations to new code
- →Held-out testing demonstrates the approach learns generalizable skills rather than memorizing specific code patterns