12 articles tagged with #hardware-acceleration. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง EdgeCIM presents a specialized hardware-software framework designed to accelerate Small Language Model inference on edge devices by addressing memory-bandwidth bottlenecks inherent in autoregressive decoding. The system achieves significant performance and energy improvements over existing mobile accelerators, reaching 7.3x higher throughput than NVIDIA Orin Nano on 1B-parameter models.
๐ข Nvidia
AIBullisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers developed an SRAM-based compute-in-memory accelerator for spiking neural networks that uses linear decay approximation instead of exponential decay, achieving 1.1x to 16.7x reduction in energy consumption. The innovation addresses the bottleneck of neuron state updates in neuromorphic computing by performing in-place decay directly within memory arrays.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.
AIBullisharXiv โ CS AI ยท Feb 277/106
๐ง Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.
AIBullisharXiv โ CS AI ยท Mar 266/10
๐ง Researchers introduce AscendOptimizer, an AI agent that optimizes operators for Huawei's Ascend NPUs through evolutionary search and experience-based learning. The system achieved 1.19x geometric-mean speedup over baselines on 127 real operators, with nearly 50% outperforming reference implementations.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers introduce EvoKernel, a self-evolving AI framework that addresses the 'Data Wall' problem in deploying Large Language Models for kernel synthesis on data-scarce hardware platforms like NPUs. The system uses memory-based reinforcement learning to improve correctness from 11% to 83% and achieves 3.60x speedup through iterative refinement.
AIBullisharXiv โ CS AI ยท Mar 26/1014
๐ง Researchers propose BiKA, a new ultra-lightweight neural network accelerator inspired by Kolmogorov-Arnold Networks that uses binary thresholds instead of complex computations. The FPGA prototype demonstrates 27-51% reduction in hardware resource usage compared to existing binarized and quantized neural network accelerators while maintaining competitive accuracy.
AIBullishHugging Face Blog ยท Mar 286/107
๐ง The article discusses accelerating Large Language Model (LLM) inference using Text Generation Inference (TGI) on Intel Gaudi hardware. This represents a technical advancement in AI infrastructure optimization for improved performance and efficiency in LLM deployment.
AIBullishHugging Face Blog ยท May 256/106
๐ง Intel has released optimization techniques for running Stable Diffusion AI models on CPUs using NNCF (Neural Network Compression Framework) and Hugging Face Optimum. These optimizations aim to improve performance and reduce computational requirements for AI image generation on Intel hardware without requiring expensive GPUs.
AIBullishHugging Face Blog ยท Jun 156/104
๐ง Intel has partnered with Hugging Face to democratize machine learning hardware acceleration, making AI model deployment more accessible across different hardware platforms. This collaboration aims to optimize AI workloads on Intel hardware while leveraging Hugging Face's extensive model ecosystem.
AIBullishHugging Face Blog ยท Nov 194/105
๐ง The article discusses methods for accelerating PyTorch distributed fine-tuning using Intel's hardware and software technologies. It focuses on optimizations for training deep learning models more efficiently on Intel infrastructure.
AINeutralHugging Face Blog ยท Feb 63/103
๐ง The article appears to be about optimizing PyTorch Transformers performance using Intel Sapphire Rapids processors, but the article body content is missing from the provided text.