#token-optimization News & Analysis

17 articles tagged with #token-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AIBullisharXiv – CS AI · Jun 57/10

🧠

Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents

Researchers introduce MAGE, a novel memory management system for LLM-based agents that organizes task histories as hierarchical state trees rather than semantic similarity clusters. The approach achieves 7.8-20.4 percentage point improvements in task success rates while reducing token consumption by 55.1% on long-horizon tasks with interdependent decisions.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

Researchers introduce MicroSkill Architecture, a modular framework that organizes AI coding knowledge into atomic skill capsules rather than feeding entire codebases to language models. The approach reduces token consumption by 90%, doubles compilation success rates, and eliminates architectural violations in enterprise systems.

AINeutralarXiv – CS AI · Jun 47/10

🧠

OckBench: Measuring the Efficiency of LLM Reasoning

Researchers introduce OckBench, the first benchmark measuring both accuracy and token efficiency in large language models, revealing that models solving identical problems can differ by up to 5.0x in token usage. The findings highlight significant inefficiencies in current LLMs that inflate serving costs and latency, prompting a shift in evaluation paradigms toward optimizing token efficiency alongside performance.

🧠 GPT-5🧠 Gemini

AIBullisharXiv – CS AI · Jun 37/10

🧠

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Researchers propose CLEAR, an economic optimization framework for allocating computational budgets during LLM inference by modeling resource allocation as a constrained optimization problem. The approach uses a global shadow price mechanism to redistribute tokens from queries unlikely to succeed to those near performance thresholds, achieving up to 3x accuracy improvements in resource-constrained environments.

AIBullisharXiv – CS AI · May 127/10

🧠

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards

Researchers introduce DUET, a method for optimizing token allocation in reinforcement learning with verifiable rewards that jointly controls which prompts receive rollouts and how long each rollout runs. The technique achieves superior reasoning quality on math and coding benchmarks while using 50% fewer tokens than baseline methods, suggesting efficiency gains don't require sacrificing model performance.

🧠 Llama

AIBullisharXiv – CS AI · May 47/10

🧠

A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction

Researchers introduce A11y-Compressor, a framework that optimizes how AI agents interpret graphical user interfaces by transforming accessibility trees into more efficient representations. The approach reduces input tokens by 78% while simultaneously improving task success rates by 5.1 percentage points, addressing a critical bottleneck in GUI automation systems.

AIBullisharXiv – CS AI · Mar 37/104

🧠

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Researchers introduce SwiReasoning, a training-free framework that improves large language model reasoning by dynamically switching between explicit chain-of-thought and latent reasoning modes. The method achieves 1.8%-3.1% accuracy improvements and 57%-79% better token efficiency across mathematics, STEM, coding, and general benchmarks.

AIBullisharXiv – CS AI · Jun 86/10

🧠

Dual Latent Memory for Visual Multi-agent System

Researchers propose L²-VMAS, a framework addressing the 'scaling wall' problem in Visual Multi-Agent Systems where adding more agents degrades performance despite higher computational costs. The solution uses dual latent memory and entropy-driven triggering to improve accuracy by 2.7-5.4% while reducing token usage by 21.3-44.8%.

AIBullisharXiv – CS AI · Jun 56/10

🧠

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

Researchers propose Causal Minimal Tool Filtering (CMTF), a training-free method that improves LLM agent reliability by exposing only necessary tools at each step rather than entire tool menus. The approach reduces token usage by 90% and tool exposure from 100 to 1 per step while maintaining task success rates.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

Researchers introduce a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems, identifying six failure modes through online trace signals. Testing on 165 GAIA validation traces reveals 41% failure rates across difficulty levels and token consumption ranging from 8,152 to 16,389 tokens, positioning observability as a diagnostic layer between execution logs and accuracy.

AINeutralarXiv – CS AI · Jun 26/10

🧠

BAGEN: Are LLM Agents Budget-Aware?

Researchers introduce BAGEN, a framework for evaluating whether large language model agents properly manage computational budgets during execution. The study reveals that frontier AI models consistently fail to predict remaining costs and continue spending resources on unlikely-to-succeed tasks, though budget-aware training can reduce token waste by 28-64% on failed trajectories.

AINeutralarXiv – CS AI · May 296/10

🧠

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

Researchers benchmark token-optimized data formats (TRON and TOON) against JSON in agentic AI systems, finding TRON reduces token consumption by up to 27% with acceptable accuracy trade-offs. The study reveals that while these alternatives show promise in isolated tasks, their real-world performance in multi-turn agent loops exposes limitations, particularly with TOON's parsing cascades and parallel tool-call handling.

AINeutralarXiv – CS AI · May 126/10

🧠

RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation

Researchers introduce RADAR, a framework that optimizes multi-agent LLM communication structures through adaptive diffusion models, reducing token consumption while improving task accuracy. The approach moves beyond fixed communication topologies to enable dynamic, task-specific agent coordination across diverse computational problems.

AIBullisharXiv – CS AI · Apr 76/10

🧠

ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture

ANX is a new protocol-first framework designed for AI agent interaction, featuring a 3EX decoupled architecture that reduces token consumption by up to 66% compared to existing methods. The open-source protocol addresses security and efficiency issues in current AI agent implementations through agent-native design and integrated CLI, Skill, and MCP components.

🧠 GPT-4

AIBullisharXiv – CS AI · Apr 76/10

🧠

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

Research reveals that multi-agent LLM committees suffer from 'representational collapse' where agents produce highly similar outputs despite different role prompts, with mean cosine similarity of 0.888. A new diversity-aware consensus protocol (DALC) improves accuracy to 87% while reducing token costs by 26% compared to traditional self-consistency methods.

AIBullisharXiv – CS AI · Mar 36/106

🧠

One-Token Verification for Reasoning Correctness Estimation

Researchers introduce One-Token Verification (OTV), a new method that estimates reasoning correctness in large language models during a single forward pass, reducing computational overhead. OTV reduces token usage by up to 90% through early termination while improving accuracy on mathematical reasoning tasks compared to existing verification methods.

AIBullisharXiv – CS AI · Mar 34/103

🧠

Token-Efficient Item Representation via Images for LLM Recommender Systems

Researchers propose I-LLMRec, a new method for AI recommender systems that uses images instead of lengthy text descriptions to represent items, reducing computational token usage while maintaining recommendation quality. The approach leverages the information overlap between images and descriptions to create more efficient and robust LLM-based recommendation systems.