y0news
#gpu-optimization3 articles
3 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago8
๐Ÿง 

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Researchers developed a data-driven pipeline to optimize GPU efficiency for distributed LLM adapter serving, achieving sub-5% throughput estimation error while running 90x faster than full benchmarking. The system uses a Digital Twin, machine learning models, and greedy placement algorithms to minimize GPU requirements while serving hundreds of adapters concurrently.

AIBullisharXiv โ€“ CS AI ยท 6h ago5
๐Ÿง 

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Researchers developed CUDA Agent, a reinforcement learning system that significantly outperforms existing methods for GPU kernel optimization, achieving 100% faster performance than torch.compile on benchmark tests. The system uses large-scale agentic RL with automated verification and profiling to improve CUDA kernel generation, addressing a critical bottleneck in deep learning performance.

AIBullisharXiv โ€“ CS AI ยท 6h ago5
๐Ÿง 

OM2P: Offline Multi-Agent Mean-Flow Policy

Researchers propose OM2P, a new offline multi-agent reinforcement learning algorithm that achieves efficient one-step action sampling using mean-flow models. The approach delivers up to 3.8x reduction in GPU memory usage and 10.8x speed-up in training time compared to existing diffusion and flow-based models.