AINeutralarXiv โ CS AI ยท 5h ago1
๐ง
CUDABench: Benchmarking LLMs for Text-to-CUDA Generation
Researchers introduce CUDABench, a comprehensive benchmark for evaluating Large Language Models' ability to generate CUDA code from text descriptions. The benchmark reveals significant challenges including high compilation success rates but low functional correctness, lack of domain-specific knowledge, and poor GPU hardware utilization.