11 articles tagged with #cuda. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 277/10
๐ง Researchers developed Model2Kernel, a system that automatically detects memory safety bugs in CUDA kernels used for large language model inference. The system discovered 353 previously unknown bugs across popular platforms like vLLM and Hugging Face with only nine false positives.
๐ข Hugging Face
AINeutralarXiv โ CS AI ยท Mar 46/104
๐ง Researchers introduce CUDABench, a comprehensive benchmark for evaluating Large Language Models' ability to generate CUDA code from text descriptions. The benchmark reveals significant challenges including high compilation success rates but low functional correctness, lack of domain-specific knowledge, and poor GPU hardware utilization.
AIBullisharXiv โ CS AI ยท Mar 46/102
๐ง Researchers developed GPUTOK, a GPU-accelerated tokenizer for large language models that processes text significantly faster than existing CPU-based solutions. The optimized version shows 1.7x speed improvement over tiktoken and 7.6x over HuggingFace's GPT-2 tokenizer while maintaining output quality.
AIBullishOpenAI News ยท Jul 287/106
๐ง OpenAI has released Triton 1.0, an open-source Python-like programming language that allows researchers without CUDA expertise to write highly efficient GPU code for neural networks. The tool aims to democratize GPU programming by making it accessible to those without specialized hardware programming knowledge while maintaining performance comparable to expert-level code.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose a novel self-indexing KV cache system that unifies compression and retrieval for efficient sparse attention in large language models. The method uses 1-bit vector quantization and integrates with FlashAttention to reduce memory bottlenecks in long-context LLM inference.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง TiledAttention is a new CUDA-based scaled dot-product attention kernel for PyTorch that enables easier modification of attention mechanisms for AI research. It provides a balance between performance and customizability, delivering significant speedups over standard attention implementations while remaining directly editable from Python.
$DOT
AIBullisharXiv โ CS AI ยท Mar 27/1013
๐ง Researchers developed CUDA Agent, a reinforcement learning system that significantly outperforms existing methods for GPU kernel optimization, achieving 100% faster performance than torch.compile on benchmark tests. The system uses large-scale agentic RL with automated verification and profiling to improve CUDA kernel generation, addressing a critical bottleneck in deep learning performance.
AIBullishHugging Face Blog ยท Jan 286/105
๐ง The article discusses using Claude AI to build CUDA kernels and teach open-source models, demonstrating AI's capability in low-level programming and knowledge transfer. This represents a significant advancement in AI-assisted development and model training techniques.
AIBullishOpenAI News ยท Dec 66/107
๐ง A company has released highly-optimized GPU kernels for block-sparse neural network architectures that can run orders of magnitude faster than existing solutions like cuBLAS or cuSPARSE. These kernels have achieved state-of-the-art results in text sentiment analysis and generative modeling applications.
AINeutralHugging Face Blog ยท Aug 184/104
๐ง This appears to be a technical guide covering the development and scaling of CUDA kernels for GPU computing applications. The article likely provides practical insights for developers looking to build production-ready GPU-accelerated solutions.
AINeutralMarkTechPost ยท Apr 64/10
๐ง A technical tutorial demonstrates implementing NVIDIA's Transformer Engine with mixed-precision acceleration, covering GPU setup, CUDA compatibility verification, and fallback execution handling. The guide focuses on practical deep learning workflow optimization using FP8 precision and benchmarking techniques.
๐ข Nvidia