#compiler-optimization News & Analysis

9 articles tagged with #compiler-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

9 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve

AI-PROPELLER introduces the first warehouse-scale interprocedural code layout optimization system, using an evolutionary AI workflow to improve binary performance by 0.23-1.6% beyond existing post-link optimizers. This breakthrough applies machine learning to compiler optimization in industrial production environments, achieving measurable real-world performance gains.

AIBullisharXiv – CS AI · May 277/10

🧠

Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU

Xe-Forge is an LLM-powered system that automates kernel optimization for Intel GPUs, eliminating repetitive manual porting work that typically gates algorithm deployment on new accelerators. Testing on 97 kernels achieved 1.17x geometric mean speedup with 67% of kernels improving and some exceeding 5x gains, demonstrating that structured domain knowledge combined with hardware-in-the-loop verification can systematically accelerate hardware adoption.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Provenance Tracking in AI Compilers through the Lens of Coalgebra

Researchers present a coalgebra-based approach to tracking tensor and operator provenance through AI compiler transformations, addressing the challenge of maintaining computational lineage during aggressive graph rewrites. The method uses observational semantics rather than identifier propagation, with a prototype implementation called COVAN demonstrating practical viability with minimal engineering overhead.

AIBullisharXiv – CS AI · Jun 96/10

🧠

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

Researchers demonstrate a two-stage methodology for deploying large language models end-to-end on energy-efficient spatial NPUs, progressing from human-guided optimization to fully autonomous agent deployment. The approach achieves significant performance improvements and successfully deploys eight additional LLM variants on AMD XDNA 2 NPUs with minimal human intervention, marking the first open-source deployments of these models on AMD hardware.

🧠 Llama

AIBullisharXiv – CS AI · Jun 96/10

🧠

FuseFSS: Efficient Secure LLM Inference with Function Secret Sharing

FuseFSS is a new compiler that streamlines secure LLM inference by consolidating fragmented protocol designs into a unified pipeline, achieving 1.24x-1.50x speedup and reducing communication overhead by 9-16% compared to existing function secret sharing approaches. The technology enables privacy-preserving queries to large language models without revealing user prompts, addressing a critical bottleneck in cryptographic systems for AI inference.

AINeutralarXiv – CS AI · Jun 56/10

🧠

TensorBench: Benchmarking Coding Agents on a Compiler-Based Tensor Framework

Researchers introduced TensorBench, a 199-task benchmark for evaluating coding agents on a PyTorch-based tensor framework, addressing the trade-off between task difficulty and evaluation reliability in repository-level coding benchmarks. Testing seven frontier AI models revealed significant performance variation, with pass rates ranging from 64.8% to 22.1%, suggesting distinct strengths across different coding agent architectures.

AINeutralarXiv – CS AI · May 126/10

🧠

LLM Translation of Compiler Intermediate Representation

Researchers introduce IRIS-14B, a 14-billion-parameter LLM fine-tuned to translate compiler intermediate representations between GCC's GIMPLE and LLVM IR, achieving up to 44 percentage points higher accuracy than existing state-of-the-art models. The approach demonstrates how LLMs can function as interoperability layers in hybrid compiler architectures, enabling cross-toolchain workflows without modifying existing compiler infrastructure.

AIBullisharXiv – CS AI · Mar 266/10

🧠

AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

Researchers introduce AscendOptimizer, an AI agent that optimizes operators for Huawei's Ascend NPUs through evolutionary search and experience-based learning. The system achieved 1.19x geometric-mean speedup over baselines on 127 real operators, with nearly 50% outperforming reference implementations.

AIBullisharXiv – CS AI · Mar 27/1013

🧠

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Researchers developed CUDA Agent, a reinforcement learning system that significantly outperforms existing methods for GPU kernel optimization, achieving 100% faster performance than torch.compile on benchmark tests. The system uses large-scale agentic RL with automated verification and profiling to improve CUDA kernel generation, addressing a critical bottleneck in deep learning performance.