#acceleration News & Analysis

6 articles tagged with #acceleration. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · May 127/10

🧠

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

PARD-2 introduces a dual-mode speculative decoding framework that accelerates large language model inference by up to 6.94× through improved draft model training aligned with token acceptance rather than prediction accuracy. The advancement uses Confidence-Adaptive Token optimization to enable single draft models to operate in both target-dependent and target-independent modes, significantly outperforming existing methods like EAGLE-3.

🧠 Llama

AIBullisharXiv – CS AI · Mar 267/10

🧠

QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations

Researchers have developed QUARK, a quantization-enabled FPGA acceleration framework that significantly improves Transformer model performance by optimizing nonlinear operations through circuit sharing. The system achieves up to 1.96x speedup over GPU implementations while reducing hardware overhead by more than 50% compared to existing approaches.

AIBullisharXiv – CS AI · Mar 266/10

🧠

Accelerating Diffusion-based Video Editing via Heterogeneous Caching: Beyond Full Computing at Sampled Denoising Timestep

Researchers introduce HetCache, a training-free acceleration framework for diffusion-based video editing that achieves 2.67x speedup by selectively caching contextually relevant tokens instead of processing all attention operations. The method reduces computational redundancy in Diffusion Transformers while maintaining video editing quality and consistency.

AIBullisharXiv – CS AI · Mar 36/106

🧠

Stateful Token Reduction for Long-Video Hybrid VLMs

Researchers developed a new token reduction method for hybrid vision-language models that process long videos, achieving 3.8-4.2x speedup while retaining only 25% of visual tokens. The approach uses progressive reduction and unified scoring for both attention and Mamba blocks, maintaining near-baseline accuracy on long-context video benchmarks.

$NEAR

AIBullisharXiv – CS AI · Mar 37/107

🧠

FastBUS: A Fast Bayesian Framework for Unified Weakly-Supervised Learning

Researchers propose FastBUS, a new Bayesian framework for weakly-supervised machine learning that addresses computational inefficiencies in existing methods. The framework uses probabilistic transitions and belief propagation to achieve state-of-the-art results while delivering up to hundreds of times faster processing speeds than current general methods.

AIBullishHugging Face Blog · Jun 136/105

🧠

Hugging Face and AMD partner on accelerating state-of-the-art models for CPU and GPU platforms

Hugging Face and AMD have announced a partnership to optimize and accelerate state-of-the-art AI models for both CPU and GPU platforms. This collaboration aims to improve performance and accessibility of AI models across AMD's hardware ecosystem.