#adaptive-inference News & Analysis

8 articles tagged with #adaptive-inference. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

AIR: Adaptive Interleaved Reasoning with Code in MLLMs

Researchers propose AIR, a framework enhancing multimodal large language models (MLLMs) with adaptive reasoning capabilities through interleaved code execution and reinforcement learning. The approach addresses limitations in existing vision-focused tools by enabling models to handle complex numerical computations, achieving 6.1 percentage point performance improvements and over 95% tool-use success rates.

🏢 OpenAI🧠 o1🧠 o3

AIBullisharXiv – CS AI · Jun 107/10

🧠

From Context-Aware to Conflict-Aware: Generalizing Contrastive Decoding for Knowledge Conflict in LLMs

Researchers propose a conflict-aware paradigm for large language models that dynamically balances external context against parametric knowledge, addressing failures in existing contrastive decoding methods. The work introduces Adaptive Regime Routing (ARR) to resolve fundamental asymmetries in how models handle contradictory information, improving resistance to erroneous context by 3-5x while maintaining performance on correct context.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Not All Claims Are Equally Risky: FACTOR for Adaptive Verification in Factual Long-Form Generation

Researchers introduce FACTOR, an inference-time verification system that adaptively checks factual claims in LLM-generated text based on individual claim uncertainty rather than applying uniform verification to all statements. The approach simultaneously improves factuality and reduces computational verification costs on the FactScore benchmark.

AINeutralarXiv – CS AI · Jun 116/10

🧠

AVIS: Adaptive Test-Time Scaling for Vision-Language Models

Researchers introduce AVIS, a lightweight adaptive policy that optimizes inference efficiency in Vision-Language Models by jointly scaling visual context and reasoning computation. The method uses token pruning and difficulty prediction to reduce computational costs while maintaining or improving accuracy across image and video reasoning tasks.

AINeutralarXiv – CS AI · May 116/10

🧠

Same Signal, Opposite Meaning: Direction-Informed Adaptive Learning for LLM Agents

Researchers demonstrate that adaptive compute gates for LLM agents produce unstable and reversible signals across different environments and models, where the same confidence metric predicts both beneficial and harmful outcomes. They propose DIAL, a learned gating mechanism trained through counterfactual exploration, which outperforms fixed-direction baselines by accounting for task-specific utility directions.

AINeutralarXiv – CS AI · May 96/10

🧠

Budgeted Attention Allocation: Cost-Conditioned Compute Control for Efficient Transformers

Researchers present Budgeted Attention Allocation, a mechanism that allows a single transformer model to operate at multiple efficiency-accuracy tradeoffs by dynamically gating attention heads based on computational budgets. The approach achieves measurable speedups (1.2-1.28x) on CPU benchmarks while maintaining competitive accuracy across multiple datasets, enabling flexible deployment scenarios without retraining.

AIBullisharXiv – CS AI · Mar 276/10

🧠

EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

Researchers have developed EcoThink, an energy-aware AI framework that reduces inference energy consumption by 40.4% on average while maintaining performance. The system uses adaptive routing to skip unnecessary computation for simple queries while preserving deep reasoning for complex tasks, addressing sustainability concerns in large language model deployment.

AIBullisharXiv – CS AI · Mar 36/104

🧠

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Researchers introduce AdaBlock-dLLM, a training-free optimization technique for diffusion-based large language models that adaptively adjusts block sizes during inference based on semantic structure. The method addresses limitations in conventional fixed-block semi-autoregressive decoding, achieving up to 5.3% accuracy improvements under the same throughput budget.