y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-optimization News & Analysis

139 articles tagged with #llm-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

139 articles
AIBullisharXiv – CS AI · Mar 46/102
🧠

ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

ScaleDoc is a new system that enables efficient semantic analysis of large document collections using LLMs by combining offline document representation with lightweight online filtering. The system achieves 2x speedup and reduces expensive LLM calls by up to 85% through contrastive learning and adaptive cascade mechanisms.

AIBullisharXiv – CS AI · Mar 47/104
🧠

You Only Fine-tune Once: Many-Shot In-Context Fine-Tuning for Large Language Models

Researchers propose Many-Shot In-Context Fine-tuning (ManyICL), a novel approach that significantly improves large language model performance by treating multiple in-context examples as supervised training targets rather than just prompts. The method narrows the performance gap between in-context learning and dedicated fine-tuning while reducing catastrophic forgetting issues.

AIBullisharXiv – CS AI · Mar 37/104
🧠

Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

Researchers introduce Group Tree Optimization (GTO), a new training method that improves speculative decoding for large language models by aligning draft model training with actual decoding policies. GTO achieves 7.4% better acceptance length and 7.7% additional speedup over existing state-of-the-art methods across multiple benchmarks and LLMs.

AIBullisharXiv – CS AI · Mar 37/103
🧠

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

Researchers introduce FreeKV, a training-free optimization framework that dramatically improves KV cache retrieval efficiency for large language models with long context windows. The system achieves up to 13x speedup compared to existing methods while maintaining near-lossless accuracy through speculative retrieval and hybrid memory layouts.

$NEAR
AIBullisharXiv – CS AI · Mar 37/105
🧠

Arbor: A Framework for Reliable Navigation of Critical Conversation Flows

Researchers introduce Arbor, a framework that decomposes large language model decision-making into specialized node-level tasks for critical applications like healthcare triage. The system improves accuracy by 29.4 percentage points while reducing latency by 57.1% and costs by 14.4x compared to single-prompt approaches.

AIBullisharXiv – CS AI · Feb 277/105
🧠

Towards Autonomous Memory Agents

Researchers introduce U-Mem, an autonomous memory agent system that actively acquires and validates knowledge for large language models. The system uses cost-aware knowledge extraction and semantic Thompson sampling to improve performance, showing significant gains on benchmarks like HotpotQA and AIME25.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

Researchers present a systematic analysis of hybrid multi-agent systems combining cloud-based large language models with on-device small language models, revealing that optimal architecture design is highly task-dependent and that increased frontier compute does not guarantee better performance across the power-cost-accuracy Pareto frontier.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

Researchers propose a novel method for optimizing multi-agent LLM systems by decomposing credit assignment into temporal and structural components, enabling more efficient prompt optimization through targeted refinement rather than global updates. The approach uses state-space bottleneck analysis and role-based policy isolation to identify and fix weak components in collaborative AI systems, reducing computational queries while improving reasoning performance across benchmarks.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs

Researchers introduce NaRA (Noise-aware Low-Rank Adaptation), a parameter-efficient fine-tuning method designed specifically for diffusion large language models that adapts to noise levels during the denoising process. Unlike existing methods like LoRA that use static parameters, NaRA employs a hypernetwork to dynamically adjust low-rank matrices based on noise, achieving better performance on reasoning and code generation tasks.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection

Researchers introduce Loong, an AI agent designed to improve long document translation by selectively retrieving relevant context from a 3E memory module rather than processing all available information. The system uses reinforcement learning to optimize context selection and demonstrates significant translation quality improvements across multiple language pairs, achieving gains up to 13 points on standard evaluation metrics.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Researchers introduce RACE-Sched, an asynchronous AI framework that combines real-time symbolic heuristics with LLM-powered reasoning to solve dynamic job shop scheduling problems in industrial systems. The approach decouples fast reactive execution from slower deliberative optimization, enabling superior performance over deep reinforcement learning baselines while maintaining interpretability and millisecond-level response times.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

CORE-T: COherent REtrieval of Tables for Text-to-SQL

CORE-T introduces a training-free framework for improving table retrieval in text-to-SQL systems by combining dense retrieval with LLM-generated metadata and compatibility caching. The approach achieves significant performance gains—up to 22.7 points in table-selection F1 and 24.4 points in multi-table execution accuracy—while reducing inference tokens by 64-76% compared to LLM-intensive alternatives.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

Researchers introduce Agent-Radar, a training-free context management method that improves multi-agent LLM systems by dynamically filtering irrelevant information from long conversation histories. The technique uses temporal and spatial decay mechanisms to maintain focus on relevant context, achieving up to 7.64% performance improvements across five benchmarks.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

TIMEGATE: Sustainable Time-Boxed Promotion Gates for Continual ML Adaptation Under Resource Constraints

TIMEGATE is a new policy framework that optimizes machine learning system adaptation by intelligently managing computational budgets across training, labeling, and evaluation cycles. The research demonstrates 2.3x efficiency gains in labeling versus training and achieves 66% evaluation-compute savings without compromising model accuracy, with validated results across tabular data and large language models like LLaMA-3.1-8B.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

Researchers propose BaSE, a multi-armed bandit algorithm that optimizes how large language models allocate computational resources during evolutionary search tasks. By dynamically distributing LLM calls across parallel trajectories, BaSE improves mean fitness by 12.3% over existing baselines while addressing the reliability gap between reported best-case and typical run performance.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Researchers introduce CoHyDE, an iterative co-training method that jointly optimizes a dense encoder and LLM rewriter to improve tool retrieval for AI agents. The approach outperforms single-component baselines by 2.5-8 percentage points on standard and vague queries, addressing the fundamental challenge of bridging colloquial user language with technical API vocabularies.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

Researchers demonstrate an autoresearch framework where an AI agent autonomously optimizes LLM-based policy synthesis for multi-agent cooperation problems. The system discovers objective-dependent pipeline designs that outperform hand-crafted baselines, with fairness mechanisms emerging only when optimizing for equitable outcomes rather than efficiency.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting

Researchers identify a critical failure mode in test-time reinforcement learning (TTRL) where majority voting locks onto incorrect answers, permanently suppressing correct signals in low-ability problems. They introduce TTRL-Guard, a framework using flip-rate monitoring and selective updating to prevent this 'Correct-Answer Extinction Window,' achieving 54% relative improvement on AIME 2025 benchmarks.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

MGRetrieval: Memory-Guided Reflective Retrieval for Long-Term Dialogue Agents

Researchers introduce MGRetrieval, a novel retrieval strategy for long-term dialogue agents that uses semantic memory structures to guide multi-step retrieval rather than one-shot approaches. The method improves performance on dialogue benchmarks by 8-11% while maintaining computational efficiency, addressing a key limitation in LLM-based conversational systems.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Token Optimization Strategies for LLM-Based Oracle-to-PostgreSQL Migration

Researchers present twelve token optimization strategies for using LLMs to migrate Oracle databases to PostgreSQL, addressing cost and quality degradation challenges. Adaptive routing emerges as the optimal approach, reducing token consumption by 8.72% while maintaining 88.40% semantic match accuracy, demonstrating that token optimization requires balancing multiple objectives rather than simple prompt shortening.

AIBullisharXiv – CS AI · 3d ago6/10
🧠

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

Researchers present an LLM-powered framework that enables non-expert end users to re-optimize deployed decision-support systems through natural language interaction, eliminating dependency on operations research specialists. The system combines language models with an optimization toolbox to dynamically adapt models to changing business conditions while maintaining solution quality and interpretability.

AIBullisharXiv – CS AI · 4d ago6/10
🧠

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Researchers demonstrate that cross-lingual contrastive preference tuning (CroCo) enables large language models to improve performance across 14 languages without language-specific annotations by leveraging English-trained reward models. The method shows consistent gains in both structured and open-ended generation tasks across multiple languages while avoiding catastrophic forgetting.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Persona Generators: Generating Diverse Synthetic Personas for Arbitrary Contexts

Researchers introduce Persona Generators, AI functions that create diverse synthetic populations for evaluating AI systems across varied user demographics without needing extensive real-world data collection. Using iterative optimization with large language models, the approach generates lightweight code that produces synthetic personas spanning rare trait combinations and long-tail behaviors, outperforming existing baselines on diversity metrics.

AIBullisharXiv – CS AI · 4d ago6/10
🧠

Scaling GraphLLM with Bilevel-Optimized Sparse Querying

Researchers introduce BOSQ, a framework that optimizes the use of large language models for graph neural network tasks by selectively querying LLMs only when necessary. This approach reduces computational costs by orders of magnitude while maintaining or improving performance on text-attributed graph datasets, addressing a critical bottleneck in practical LLM-enhanced graph learning.

← PrevPage 3 of 6Next →