#efficiency-optimization News & Analysis

10 articles tagged with #efficiency-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · Jun 107/10

🧠

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

Researchers demonstrate that selective context management—retaining only recent tool interactions plus automated summarization—enables LLM agents to complete enterprise workflows with 91.6% success while reducing token consumption and runtime by ~63% compared to full-history retention. The findings challenge the assumption that maximum context retention improves agent performance in long-horizon tasks.

🧠 GPT-5🧠 Claude🧠 Sonnet

AIBullisharXiv – CS AI · Jun 57/10

🧠

Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models

Researchers introduce Dynamic Thinking-Token Selection (DynTS), a method that optimizes Large Reasoning Models by identifying and retaining only decision-critical tokens during inference while discarding redundant reasoning trace data. This approach significantly reduces memory footprint and computational overhead, addressing a major efficiency bottleneck in LRMs that generate extended reasoning sequences.

AIBullisharXiv – CS AI · Jun 27/10

🧠

FastSLM: Hierarchical Temporal Abstraction for Efficient Long-Form Speech Adaptation

FastSLM introduces a Hierarchical Temporal Abstractor (HTA) that compresses long-form speech into just 1.67 tokens per second—a 97% reduction—while maintaining competitive performance on speech understanding benchmarks. This architecture solves a critical scaling bottleneck for multimodal AI models by preserving acoustic detail despite extreme compression, enabling efficient deployment of speech-capable language models.

AIBullisharXiv – CS AI · Jun 17/10

🧠

DynaTree: Dynamic Agentic Retrieval Tree for Time-Sensitive News Retrieval

DynaTree is a two-stage framework for efficient news retrieval that combines offline agentic reasoning with lightweight online subtree selection, achieving significant improvements in real-world deployment. The system demonstrated a 59-73% survival rate versus 32-53% for fixed approaches in production A/B testing, highlighting the practical value of persistent semantic expansion for time-sensitive information retrieval.

AIBullisharXiv – CS AI · May 127/10

🧠

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

Researchers introduce SPARK, a framework that verifies AI agent skills through direct environment interaction rather than relying on pre-written plans. The Posterior Distillation Index (PDI) metric ensures skills are grounded in actual task evidence, producing student models that match or exceed human-written skills while reducing inference costs by up to 1,000x.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Learning Adaptive Parallel Execution for Efficient Code Localization

Researchers introduce FuseSearch, an AI system that optimizes parallel code localization by reducing redundant tool invocations from 34.9% to near-zero through adaptive execution strategies. The approach combines supervised fine-tuning and reinforcement learning to dynamically adjust search breadth, achieving state-of-the-art performance on SWE-bench while using 68.9% fewer tokens and delivering 93.6% speedup.

AIBullisharXiv – CS AI · Jun 36/10

🧠

Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents

Researchers propose the Pre-Reasoning Perception Framework (PRPF), a two-stage system that improves mobile agent efficiency by separating intervention detection from task reasoning. The framework uses a lightweight perceptor to decide when assistance is needed before activating a larger reasoning model, reducing false triggers and computational overhead.

AINeutralarXiv – CS AI · Jun 26/10

🧠

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

SpeedAug is a new reinforcement learning framework that accelerates robotic policy execution by learning optimal task speeds rather than relying on conservative demonstration data. The method combines tempo-enriched policy learning with RL fine-tuning to achieve 1.8x faster real-world task throughput while maintaining success rates.

AIBullisharXiv – CS AI · Jun 16/10

🧠

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

Researchers introduce SAGE, a memory management system for agentic LLMs that uses novelty detection to efficiently control when new facts are added, merged, or ignored. The approach reduces API costs and latency by 3.4× and 2.5× respectively while maintaining quality, addressing a critical gap in write-side memory control for long-context AI agents.

🧠 GPT-4

AIBullisharXiv – CS AI · May 116/10

🧠

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Researchers introduce HyperEyes, a parallel multimodal search agent that processes multiple entities concurrently rather than sequentially, achieving 9.9% higher accuracy with 5.3x fewer tool calls than comparable systems. The system combines visual grounding and retrieval into atomic actions and uses dual-level reinforcement learning to optimize both accuracy and inference efficiency, addressing a gap in existing multimodal AI benchmarks that ignore computational cost.