AIBullisharXiv – CS AI · 18h ago7/10
🧠
AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference
AgentCompile is an LLM-guided CUDA inference compiler that uses large language models to optimize transformer model execution on GPUs. The system achieves 4-5.66x speedup over PyTorch across popular models like Qwen and Llama through intelligent specialization decisions and empirical validation.
🧠 Llama