AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce Mixed-Policy Distillation (MPD), a technique that compresses reasoning in smaller language models by having larger teacher models rewrite student-generated reasoning traces into more concise versions. The method reduces token usage by up to 27.1% while maintaining or improving performance, addressing critical deployment constraints around memory, latency, and serving costs.
AIBullisharXiv – CS AI · May 97/10
🧠ReaComp introduces a method to compile reasoning traces from large language models into reusable symbolic program synthesizers that eliminate runtime LLM calls. The approach achieves 91.3% accuracy on benchmark tasks while reducing token usage by 78%, demonstrating that neuro-symbolic hybrid systems can outperform pure LLM inference on complex program synthesis problems.
AIBullisharXiv – CS AI · Apr 67/10
🧠Researchers discovered that in Large Reasoning Models like DeepSeek-R1, the first solution is often the best, with alternative solutions being detrimental due to error accumulation. They propose RED, a new framework that achieves up to 19% performance gains while reducing token consumption by 37.7-70.4%.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce D-MEM, a biologically-inspired memory architecture for AI agents that uses dopamine-like reward prediction error routing to dramatically reduce computational costs. The system reduces token consumption by over 80% and eliminates quadratic scaling bottlenecks by selectively processing only high-importance information through cognitive restructuring.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers developed SToRM, a new framework that reduces computational costs for autonomous driving systems using multi-modal large language models by up to 30x while maintaining performance. The system uses supervised token reduction techniques to enable real-time end-to-end driving on standard GPUs without sacrificing safety or accuracy.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers introduce LightMem, a new memory system for Large Language Models that mimics human memory structure with three stages: sensory, short-term, and long-term memory. The system achieves up to 7.7% better QA accuracy while reducing token usage by up to 106x and API calls by up to 159x compared to existing methods.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers introduce Contextual Memory Virtualisation (CMV), a system that preserves LLM understanding across extended sessions by treating context as version-controlled state using DAG-based management. The system includes a trimming algorithm that reduces token counts by 20-86% while preserving all user interactions, demonstrating particular efficiency in tool-use sessions.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers developed monitoring strategies to detect when Large Reasoning Models are engaging in unproductive reasoning by identifying early failure signals. The new techniques reduce token usage by 62.7-93.6% while maintaining accuracy, significantly improving AI model efficiency.
AIBullisharXiv – CS AI · Mar 36/106
🧠Researchers developed a new token reduction method for hybrid vision-language models that process long videos, achieving 3.8-4.2x speedup while retaining only 25% of visual tokens. The approach uses progressive reduction and unified scoring for both attention and Mamba blocks, maintaining near-baseline accuracy on long-context video benchmarks.
$NEAR