AIBullisharXiv – CS AI · Mar 97/10
🧠Researchers introduce COLD-Steer, a training-free framework that enables efficient control of large language model behavior at inference time using just a few examples. The method approximates gradient descent effects without parameter updates, achieving 95% steering effectiveness while using 50 times fewer samples than existing approaches.
AIBullisharXiv – CS AI · Mar 97/10
🧠Researchers have developed a new technique called activation steering to reduce reasoning biases in large language models, particularly the tendency to confuse content plausibility with logical validity. Their novel K-CAST method achieved up to 15% improvement in formal reasoning accuracy while maintaining robustness across different tasks and languages.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers introduce Multi-Sequence Verifier (MSV), a new technique that improves large language model performance by jointly processing multiple candidate solutions rather than scoring them individually. The system achieves better accuracy while reducing inference latency by approximately half through improved calibration and early-stopping strategies.
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers introduce OSCAR, a new query-dependent online soft compression method for Retrieval-Augmented Generation (RAG) systems that reduces computational overhead while maintaining performance. The method achieves 2-5x speed improvements in inference with minimal accuracy loss across LLMs from 1B to 24B parameters.
🏢 Hugging Face
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers developed EvoPrune, a new method that prunes visual tokens during the encoding stage of Multimodal Large Language Models (MLLMs) rather than after encoding. The technique achieves 2x inference speedup with less than 1% performance loss on video datasets, addressing efficiency bottlenecks in AI models processing high-resolution images and videos.
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers introduce Energy Landscape Steering (ELS), a new framework that reduces false refusals in AI safety-aligned language models without compromising security. The method uses an external Energy-Based Model to dynamically guide model behavior during inference, improving compliance from 57.3% to 82.6% on safety benchmarks.
AIBullisharXiv – CS AI · Mar 47/103
🧠Nightjar is a new adaptive speculative decoding framework for large language models that dynamically adjusts to system load conditions. It achieves 27.29% higher throughput and up to 20.18% lower latency by intelligently enabling or disabling speculation based on workload demands.
AIBullisharXiv – CS AI · Mar 47/102
🧠Researchers developed a new channel-adaptive AI algorithm that maximizes inference throughput in 6G edge computing networks by dynamically adjusting computational complexity based on channel conditions. The system uses integrated communication and computation (IC²) to optimize both feature compression and model complexity for mobile edge inference.
AIBullisharXiv – CS AI · Mar 47/104
🧠Researchers propose 'best-of-∞' approach for large language models that uses majority voting with infinite samples, achieving superior performance but requiring infinite computation. They develop an adaptive generation scheme that dynamically selects the optimal number of samples based on answer agreement and extend the framework to weighted ensembles of multiple LLMs.
AIBullisharXiv – CS AI · Mar 37/103
🧠Researchers have developed SageBwd, a trainable INT8 attention mechanism that can match full-precision attention performance during pre-training while quantizing six of seven attention matrix multiplications. The study identifies key factors for stable training including QK-norm requirements and the impact of tokens per step on quantization errors.
AIBullisharXiv – CS AI · Mar 37/105
🧠Researchers developed HierarchicalPrune, a compression framework that reduces large-scale text-to-image diffusion models' memory footprint by 77.5-80.4% and latency by 27.9-38.0% while maintaining image quality. The technique enables billion-parameter AI models to run efficiently on resource-constrained devices through hierarchical pruning and knowledge distillation.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers developed a new robotic policy framework using dense-jump flow matching with non-uniform time scheduling to address performance degradation in multi-step inference. The approach achieves up to 23.7% performance gains over existing baselines by optimizing integration scheduling during training and inference phases.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers have developed BWCache, a training-free method that accelerates Diffusion Transformer (DiT) video generation by up to 6× through block-wise feature caching and reuse. The technique exploits computational redundancy in DiT blocks across timesteps while maintaining visual quality, addressing a key bottleneck in real-world AI video generation applications.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed ViT-Linearizer, a distillation framework that transfers Vision Transformer knowledge into linear-time models, addressing quadratic complexity issues for high-resolution inputs. The method achieves 84.3% ImageNet accuracy while providing significant speedups, bridging the gap between efficient RNN-based architectures and transformer performance.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers introduce Spatial Credit Redistribution (SCR), a training-free method that reduces hallucination in vision-language models by 4.7-6.0 percentage points. The technique redistributes attention from dominant visual patches to contextual areas, addressing the spatial credit collapse problem that causes AI models to generate false objects.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce CosmicFish-HRM, a compact language model that uses a Hierarchical Reasoning Module to dynamically adjust computational effort during inference based on input complexity. The approach challenges the assumption that larger models are necessary for advanced reasoning, suggesting adaptive computation depth could offer efficiency gains as model scale increases.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduced ReasonOps, an unsupervised method for analyzing chain-of-thought traces from large language models that identifies seven universal reasoning operators (backtracking, inferring, hypothesizing, etc.) appearing consistently across 12 different LLM families. The framework enables model identification, correctness prediction, and early quality estimation without manual annotation, revealing that each model family has a distinctive reasoning fingerprint.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce SWAI, a training-free method for controlling language model outputs by manipulating logit scores using corpus-derived statistics. The technique enables real-time steering of model behavior—such as adjusting readability, politeness, and toxicity—without modifying model weights or accessing internal layers, outperforming existing prompt-based and logit-level baselines.
AIBullisharXiv – CS AI · 3d ago6/10
🧠BlockBatch introduces a training-free inference framework that optimizes diffusion language models by executing multiple block-size branches simultaneously, achieving 26.6% reduction in computational steps and 1.33x speedup over existing methods. The approach exploits the complementary nature of different decoding granularities to balance parallelism with accuracy while managing the inherent trade-offs in block-wise inference.
AINeutralDecrypt · 3d ago6/10
🧠Chinese researchers have developed an AI model that leverages idle processing time to predict and prepare for users' next queries before they're asked. This advancement in predictive AI could reduce latency and improve user experience by pre-computing likely requests during periods when the system would otherwise be inactive.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduced HRBench, a unified evaluation framework for testing hybrid-reasoning LLMs that allow dynamic switching between fast and slow reasoning modes. The framework systematically compares 12+ prior methods across three switching strategy families and four training approaches, revealing that prompt-based methods offer better token-accuracy trade-offs while routing methods provide more stable cost reduction.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce DREAM-R, a framework that accelerates reasoning in multimodal AI models through improved speculative execution. The system uses reinforcement learning to align draft models with target reasoning, a verification mechanism to prevent errors, and parallel processing to achieve significant speedup while maintaining accuracy.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers analyzed backtracking patterns in reasoning traces from the Qwen3-8B model, finding that correct reasoning typically shows early, isolated self-corrections while incorrect reasoning exhibits persistent, clustered revisions occurring late in traces. The study demonstrates that burst-aware filtering of reasoning traces can improve model reliability by identifying unstable reasoning patterns before completion.
AIBullisharXiv – CS AI · 5d ago6/10
🧠Researchers present SeDT, a training-free method that improves large language model performance in multi-turn conversations by annotating conversation history with relevance scores, addressing a documented 39% performance drop when tasks are revealed incrementally across multiple turns.