y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-acceleration News & Analysis

8 articles tagged with #model-acceleration. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles
AIBullisharXiv – CS AI · 1d ago7/10
🧠

SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism

SpecBranch introduces a novel speculative decoding framework that leverages branch parallelism to accelerate large language model inference, achieving 1.8x to 4.5x speedups over standard auto-regressive decoding. The technique addresses serialization bottlenecks in existing speculative decoding methods by implementing parallel drafting branches with adaptive token lengths and rollback-aware orchestration.

AIBullisharXiv – CS AI · Mar 167/10
🧠

When Drafts Evolve: Speculative Decoding Meets Online Learning

Researchers introduce OnlineSpec, a framework that uses online learning to continuously improve draft models in speculative decoding for large language model inference acceleration. The approach leverages verification feedback to evolve draft models dynamically, achieving up to 24% speedup improvements across seven benchmarks and three foundation models.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Researchers introduce Dynamic Pruning Policy Optimization (DPPO), a new framework that accelerates AI language model training by 2.37x while maintaining accuracy. The method addresses computational bottlenecks in Group Relative Policy Optimization through unbiased gradient estimation and improved data efficiency.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

Researchers propose NExt, a nonlinear extrapolation framework that accelerates reinforcement learning with verifiable rewards (RLVR) for large language models by modeling low-rank parameter trajectories. The method reduces computational overhead by approximately 37.5% while remaining compatible with various RLVR algorithms, addressing a key bottleneck in scaling LLM training.

AIBullisharXiv – CS AI · Mar 36/103
🧠

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

MeanCache introduces a training-free caching framework that accelerates Flow Matching inference by using average velocities instead of instantaneous ones. The framework achieves 3.59X to 4.56X acceleration on major AI models like FLUX.1, Qwen-Image, and HunyuanVideo while maintaining superior generation quality compared to existing caching methods.

AIBullisharXiv – CS AI · Mar 26/1012
🧠

Task-Centric Acceleration of Small-Language Models

Researchers propose TASC (Task-Adaptive Sequence Compression), a framework for accelerating small language models through two methods: TASC-ft for fine-tuning with expanded vocabularies and TASC-spec for training-free speculative decoding. The methods demonstrate improved inference efficiency while maintaining task performance across low output-variability generation tasks.

AIBullishHugging Face Blog · Apr 34/105
🧠

Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon

The article appears to discuss optimizing SetFit inference performance using Hugging Face's Optimum Intel library on Intel Xeon processors. This represents a technical advancement in AI model optimization and deployment efficiency on enterprise hardware.