AIBullisharXiv – CS AI · Jun 107/10
🧠Researchers demonstrate that selective context management—retaining only recent tool interactions plus automated summarization—enables LLM agents to complete enterprise workflows with 91.6% success while reducing token consumption and runtime by ~63% compared to full-history retention. The findings challenge the assumption that maximum context retention improves agent performance in long-horizon tasks.
🧠 GPT-5🧠 Claude🧠 Sonnet
AIBullisharXiv – CS AI · Jun 57/10
🧠Researchers introduce Dynamic Thinking-Token Selection (DynTS), a method that optimizes Large Reasoning Models by identifying and retaining only decision-critical tokens during inference while discarding redundant reasoning trace data. This approach significantly reduces memory footprint and computational overhead, addressing a major efficiency bottleneck in LRMs that generate extended reasoning sequences.
AIBullisharXiv – CS AI · Jun 27/10
🧠FastSLM introduces a Hierarchical Temporal Abstractor (HTA) that compresses long-form speech into just 1.67 tokens per second—a 97% reduction—while maintaining competitive performance on speech understanding benchmarks. This architecture solves a critical scaling bottleneck for multimodal AI models by preserving acoustic detail despite extreme compression, enabling efficient deployment of speech-capable language models.
AIBullisharXiv – CS AI · Jun 17/10
🧠DynaTree is a two-stage framework for efficient news retrieval that combines offline agentic reasoning with lightweight online subtree selection, achieving significant improvements in real-world deployment. The system demonstrated a 59-73% survival rate versus 32-53% for fixed approaches in production A/B testing, highlighting the practical value of persistent semantic expansion for time-sensitive information retrieval.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce SPARK, a framework that verifies AI agent skills through direct environment interaction rather than relying on pre-written plans. The Posterior Distillation Index (PDI) metric ensures skills are grounded in actual task evidence, producing student models that match or exceed human-written skills while reducing inference costs by up to 1,000x.
AIBullisharXiv – CS AI · Jun 56/10
🧠Researchers introduce FuseSearch, an AI system that optimizes parallel code localization by reducing redundant tool invocations from 34.9% to near-zero through adaptive execution strategies. The approach combines supervised fine-tuning and reinforcement learning to dynamically adjust search breadth, achieving state-of-the-art performance on SWE-bench while using 68.9% fewer tokens and delivering 93.6% speedup.
AIBullisharXiv – CS AI · Jun 36/10
🧠Researchers propose the Pre-Reasoning Perception Framework (PRPF), a two-stage system that improves mobile agent efficiency by separating intervention detection from task reasoning. The framework uses a lightweight perceptor to decide when assistance is needed before activating a larger reasoning model, reducing false triggers and computational overhead.
AINeutralarXiv – CS AI · Jun 26/10
🧠SpeedAug is a new reinforcement learning framework that accelerates robotic policy execution by learning optimal task speeds rather than relying on conservative demonstration data. The method combines tempo-enriched policy learning with RL fine-tuning to achieve 1.8x faster real-world task throughput while maintaining success rates.
AIBullisharXiv – CS AI · Jun 16/10
🧠Researchers introduce SAGE, a memory management system for agentic LLMs that uses novelty detection to efficiently control when new facts are added, merged, or ignored. The approach reduces API costs and latency by 3.4× and 2.5× respectively while maintaining quality, addressing a critical gap in write-side memory control for long-context AI agents.
🧠 GPT-4
AIBullisharXiv – CS AI · May 116/10
🧠Researchers introduce HyperEyes, a parallel multimodal search agent that processes multiple entities concurrently rather than sequentially, achieving 9.9% higher accuracy with 5.3x fewer tool calls than comparable systems. The system combines visual grounding and retrieval into atomic actions and uses dual-level reinforcement learning to optimize both accuracy and inference efficiency, addressing a gap in existing multimodal AI benchmarks that ignore computational cost.