y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#compiler-optimization News & Analysis

4 articles tagged with #compiler-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv – CS AI · 4d ago7/10
🧠

Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU

Xe-Forge is an LLM-powered system that automates kernel optimization for Intel GPUs, eliminating repetitive manual porting work that typically gates algorithm deployment on new accelerators. Testing on 97 kernels achieved 1.17x geometric mean speedup with 67% of kernels improving and some exceeding 5x gains, demonstrating that structured domain knowledge combined with hardware-in-the-loop verification can systematically accelerate hardware adoption.

AINeutralarXiv – CS AI · May 126/10
🧠

LLM Translation of Compiler Intermediate Representation

Researchers introduce IRIS-14B, a 14-billion-parameter LLM fine-tuned to translate compiler intermediate representations between GCC's GIMPLE and LLVM IR, achieving up to 44 percentage points higher accuracy than existing state-of-the-art models. The approach demonstrates how LLMs can function as interoperability layers in hybrid compiler architectures, enabling cross-toolchain workflows without modifying existing compiler infrastructure.

AIBullisharXiv – CS AI · Mar 266/10
🧠

AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

Researchers introduce AscendOptimizer, an AI agent that optimizes operators for Huawei's Ascend NPUs through evolutionary search and experience-based learning. The system achieved 1.19x geometric-mean speedup over baselines on 127 real operators, with nearly 50% outperforming reference implementations.

AIBullisharXiv – CS AI · Mar 27/1013
🧠

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Researchers developed CUDA Agent, a reinforcement learning system that significantly outperforms existing methods for GPU kernel optimization, achieving 100% faster performance than torch.compile on benchmark tests. The system uses large-scale agentic RL with automated verification and profiling to improve CUDA kernel generation, addressing a critical bottleneck in deep learning performance.