AIBullisharXiv – CS AI · 9h ago7/10
🧠
CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
CuTeGen is an AI-powered framework that automates GPU kernel generation and optimization using large language models and the CuTe abstraction layer. The system achieves 1.71× average speedup over PyTorch on standardized benchmarks by employing a generate-test-refine workflow with delayed performance profiling, significantly outperforming prior agentic approaches.