8 articles tagged with #model-acceleration. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv – CS AI · 1d ago7/10
🧠SpecBranch introduces a novel speculative decoding framework that leverages branch parallelism to accelerate large language model inference, achieving 1.8x to 4.5x speedups over standard auto-regressive decoding. The technique addresses serialization bottlenecks in existing speculative decoding methods by implementing parallel drafting branches with adaptive token lengths and rollback-aware orchestration.
AIBullisharXiv – CS AI · Mar 167/10
🧠Researchers introduce OnlineSpec, a framework that uses online learning to continuously improve draft models in speculative decoding for large language model inference acceleration. The approach leverages verification feedback to evolve draft models dynamically, achieving up to 24% speedup improvements across seven benchmarks and three foundation models.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers introduce Dynamic Pruning Policy Optimization (DPPO), a new framework that accelerates AI language model training by 2.37x while maintaining accuracy. The method addresses computational bottlenecks in Group Relative Policy Optimization through unbiased gradient estimation and improved data efficiency.
AIBullisharXiv – CS AI · 2d ago6/10
🧠Researchers propose NExt, a nonlinear extrapolation framework that accelerates reinforcement learning with verifiable rewards (RLVR) for large language models by modeling low-rank parameter trajectories. The method reduces computational overhead by approximately 37.5% while remaining compatible with various RLVR algorithms, addressing a key bottleneck in scaling LLM training.
AIBullisharXiv – CS AI · Mar 36/103
🧠MeanCache introduces a training-free caching framework that accelerates Flow Matching inference by using average velocities instead of instantaneous ones. The framework achieves 3.59X to 4.56X acceleration on major AI models like FLUX.1, Qwen-Image, and HunyuanVideo while maintaining superior generation quality compared to existing caching methods.
AIBullisharXiv – CS AI · Mar 26/1012
🧠Researchers propose TASC (Task-Adaptive Sequence Compression), a framework for accelerating small language models through two methods: TASC-ft for fine-tuning with expanded vocabularies and TASC-spec for training-free speculative decoding. The methods demonstrate improved inference efficiency while maintaining task performance across low output-variability generation tasks.
AIBullishHugging Face Blog · Sep 295/107
🧠The article discusses optimizing Qwen3-8B AI agent performance on Intel Core Ultra processors using depth-pruned draft models. This technical advancement focuses on improving AI model inference speed and efficiency on consumer-grade Intel hardware.
AIBullishHugging Face Blog · Apr 34/105
🧠The article appears to discuss optimizing SetFit inference performance using Hugging Face's Optimum Intel library on Intel Xeon processors. This represents a technical advancement in AI model optimization and deployment efficiency on enterprise hardware.