#llm-optimization News & Analysis

226 articles tagged with #llm-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

226 articles

AINeutralarXiv – CS AI · Jun 46/10

🧠

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

Researchers introduce SegTreeMem, a novel memory architecture for long-horizon conversational AI agents that organizes conversation history using temporally-ordered segment trees instead of purely semantic similarity. The system demonstrates improved performance across multiple benchmarks by preserving chronological order while enabling hierarchical retrieval, with ablation studies confirming that temporal sequencing is critical to the approach's effectiveness.

AIBullisharXiv – CS AI · Jun 46/10

🧠

Can Reasoning Path still be Effective as Input? Bridging Post-Reasoning to Chain-of-Thought Compression

Researchers propose Upfront CoT (UCoT), a framework that compresses Chain-of-Thought reasoning in large language models by using a lightweight compressor to generate soft token representations of reasoning paths. The method maintains reasoning performance while reducing token usage by 50% on benchmarks, addressing the efficiency-performance tradeoff in advanced LLM inference.

AINeutralarXiv – CS AI · Jun 36/10

🧠

Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems

Researchers introduce GAMBLe, a framework for analyzing AI-Driven Research Systems (ADRS) that couple large language models with automated evaluation. Through 760+ experiments, the framework reveals that standard convergence guarantees fail to capture ADRS behavior, and component selection can improve performance by 13-67% depending on the problem.

AINeutralarXiv – CS AI · Jun 25/10

🧠

LLM-Driven Co-Evolutionary Automated Heuristic Design for Bi-Component Coupled Combinatorial Optimization

Researchers introduce CoEvo-AHD, an LLM-driven framework that co-evolves paired operator populations to solve coupled combinatorial optimization problems like the Traveling Thief Problem. Unlike previous automated heuristic design methods that treat operators in isolation, this approach captures interactions between decision components, achieving competitive results with traditional heuristics.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization

Researchers propose a framework for incorporating Large Language Model (LLM) priors into multi-objective Bayesian optimization while maintaining robustness against miscalibrated advice. Using an objective-wise reputation mechanism and counterfactual gating, the approach dynamically adjusts trust in LLM suggestions based on observed performance rather than accepting them blindly, with empirical validation across molecular optimization tasks.

AINeutralarXiv – CS AI · Jun 26/10

🧠

LLM-Evolved Pattern Generators for Optimal Classical Planning

Researchers have developed a novel method using large language models and evolutionary algorithms to automatically generate admissible heuristics for optimal classical planning problems. Unlike existing learned heuristics that improve search speed but cannot guarantee optimal solutions, this approach preserves A* optimality guarantees while matching or exceeding the performance of traditional domain-independent methods.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval

Researchers propose DOPA, a demonstration retrieval framework that uses out-of-distribution proxies to improve large language model performance on tasks from inaccessible target domains. The method combines proxy-based evaluation with diversity constraints to enhance LLM robustness when facing severe distribution shifts.

AINeutralarXiv – CS AI · Jun 26/10

🧠

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

Researchers propose CSRP, a three-stage framework combining continual pre-training, chain-of-thought reasoning, and reinforcement learning to improve Chinese grammatical error correction in LLMs. The system achieves state-of-the-art performance on the NACGEC benchmark while addressing the over-correction problem common in supervised fine-tuning approaches.

🧠 GPT-4

AIBullisharXiv – CS AI · Jun 26/10

🧠

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models

Researchers introduce AsyMoE, a novel Mixture of Experts architecture for Large Vision-Language Models that explicitly addresses the asymmetrical processing of visual and linguistic data. The approach uses hyperbolic geometry for hierarchical relationships and evidence-priority mechanisms to improve accuracy by up to 3.8% on hallucination-sensitive tasks while reducing parameter activation by 25.45% compared to dense models.

AINeutralarXiv – CS AI · Jun 25/10

🧠

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

The LinguIUTics team achieved 4th place in the PsyDefDetect 2026 shared task by fine-tuning Qwen3-8B to classify psychological defense mechanisms in clinical conversational text, reaching a macro F1-score of 0.3917 and substantially improving performance on rare classes through specialized techniques including minority-class augmentation and ensemble methods.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Researchers present a cost model for optimizing cross-GPU attention operations in large language models, finding that routing queries is often cheaper than moving cache blocks when models are distributed across multiple nodes. The work applies to sparse-attention architectures like those in DeepSeek and GLM models, offering practical guidance for inference optimization on multi-node clusters.

AIBullisharXiv – CS AI · Jun 26/10

🧠

Dynamic Trust-Aware Sparse Communication Topology for LLM-Based Multi-Agent Consensus

Researchers propose DySCo, a dynamic sparse communication mechanism for LLM-based multi-agent systems that reduces computational overhead by selectively routing messages between agents rather than using full broadcast. The approach maintains consensus quality while cutting token costs and latency that scale quadratically with agent count, addressing a key efficiency bottleneck in collaborative AI reasoning systems.

AINeutralarXiv – CS AI · Jun 26/10

🧠

MARFT: Multi-Agent Reinforcement Fine-Tuning

Researchers present MARFT (Multi-Agent Reinforcement Fine-Tuning), a framework for optimizing LLM-based multi-agent systems using reinforcement learning. The work introduces Flex-MG, a new Markov Game formulation, and addresses key challenges in applying traditional MARL to collaborative AI systems, providing open-source implementation for advancing adaptive agentic systems.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Optimizing Diversity and Quality through Base-Aligned Model Collaboration

Researchers propose Base-Aligned Model Collaboration (BACo), an inference-time framework that dynamically combines base and aligned language models to improve both output diversity and quality simultaneously. The method uses token-level routing strategies based on uncertainty signals, achieving a 21.3% joint improvement in diversity-quality metrics without requiring expensive retraining or multi-pass decoding.

AINeutralarXiv – CS AI · Jun 26/10

🧠

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

Researchers introduced AnomSeer, a system that enhances multimodal large language models for time-series anomaly detection by grounding reasoning in precise structural details rather than coarse heuristics. Using a novel reinforcement learning approach called TimerPO, AnomSeer outperforms larger commercial models like GPT-4o in classification and localization accuracy while providing interpretable reasoning traces.

🧠 GPT-4

AINeutralarXiv – CS AI · Jun 16/10

🧠

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

Researchers introduce Prompted Policy Optimization (PromptPO), a method using large language models as black-box policy optimizers for reinforcement learning tasks. The approach demonstrates competitive or superior performance to traditional RL algorithms in exploration-heavy and robotics domains while requiring fewer environment interactions, though it underperforms in continuous control tasks like MuJoCo.

AINeutralarXiv – CS AI · Jun 16/10

🧠

On the impact of retrieved content representations in RAG Pipelines

Researchers conducted a controlled study examining how retrieved documents should be formatted when fed into language models within RAG pipelines, rather than for human readers. Testing 14 different document representations across summarization, selection, and reformulation techniques, they found that answer retention—whether documents preserve answer-bearing content after transformation—is the primary driver of generation accuracy, while other factors like wording and length have minimal impact.

AINeutralarXiv – CS AI · Jun 16/10

🧠

SAC-Opt: Semantic Anchors for Iterative Correction in Optimization Modeling

Researchers introduce SAC-Opt, a framework that improves how large language models generate optimization code by grounding corrections in semantic accuracy rather than solver feedback alone. The approach achieves 7.7% average improvement in modeling accuracy across datasets, with gains up to 21.9% on complex problems, addressing silent logical errors in LLM-generated optimization models.

AINeutralarXiv – CS AI · May 296/10

🧠

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Researchers introduce RACE-Sched, an asynchronous AI framework that combines real-time symbolic heuristics with LLM-powered reasoning to solve dynamic job shop scheduling problems in industrial systems. The approach decouples fast reactive execution from slower deliberative optimization, enabling superior performance over deep reinforcement learning baselines while maintaining interpretability and millisecond-level response times.

AIBullisharXiv – CS AI · May 296/10

🧠

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Researchers introduce CoHyDE, an iterative co-training method that jointly optimizes a dense encoder and LLM rewriter to improve tool retrieval for AI agents. The approach outperforms single-component baselines by 2.5-8 percentage points on standard and vague queries, addressing the fundamental challenge of bridging colloquial user language with technical API vocabularies.

AIBullisharXiv – CS AI · May 296/10

🧠

NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs

Researchers introduce NaRA (Noise-aware Low-Rank Adaptation), a parameter-efficient fine-tuning method designed specifically for diffusion large language models that adapts to noise levels during the denoising process. Unlike existing methods like LoRA that use static parameters, NaRA employs a hypernetwork to dynamically adjust low-rank matrices based on noise, achieving better performance on reasoning and code generation tasks.

AIBullisharXiv – CS AI · May 296/10

🧠

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

Researchers introduce Agent-Radar, a training-free context management method that improves multi-agent LLM systems by dynamically filtering irrelevant information from long conversation histories. The technique uses temporal and spatial decay mechanisms to maintain focus on relevant context, achieving up to 7.64% performance improvements across five benchmarks.

AINeutralarXiv – CS AI · May 296/10

🧠

TIMEGATE: Sustainable Time-Boxed Promotion Gates for Continual ML Adaptation Under Resource Constraints

TIMEGATE is a new policy framework that optimizes machine learning system adaptation by intelligently managing computational budgets across training, labeling, and evaluation cycles. The research demonstrates 2.3x efficiency gains in labeling versus training and achieves 66% evaluation-compute savings without compromising model accuracy, with validated results across tabular data and large language models like LLaMA-3.1-8B.

AIBullisharXiv – CS AI · May 296/10

🧠

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

Researchers propose BaSE, a multi-armed bandit algorithm that optimizes how large language models allocate computational resources during evolutionary search tasks. By dynamically distributing LLM calls across parallel trajectories, BaSE improves mean fitness by 12.3% over existing baselines while addressing the reliability gap between reported best-case and typical run performance.

AINeutralarXiv – CS AI · May 296/10

🧠

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

Researchers demonstrate an autoresearch framework where an AI agent autonomously optimizes LLM-based policy synthesis for multi-agent cooperation problems. The system discovers objective-dependent pipeline designs that outperform hand-crafted baselines, with fairness mechanisms emerging only when optimizing for equitable outcomes rather than efficiency.

← PrevPage 6 of 10Next →