y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-optimization News & Analysis

116 articles tagged with #llm-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

116 articles
AIBullisharXiv – CS AI · May 116/10
🧠

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

Researchers introduce CA-SQL, an advanced Text-to-SQL pipeline that dynamically allocates computational resources based on task complexity to improve LLM reasoning. The method achieves state-of-the-art performance on the BIRD benchmark's challenging tier using only GPT-4o-mini, outperforming larger models and demonstrating the efficiency gains possible through intelligent inference-time optimization.

🧠 GPT-4
AIBullisharXiv – CS AI · May 116/10
🧠

PerfCoder: Large Language Models for Interpretable Code Performance Optimization

Researchers introduce PerfCoder, a specialized family of large language models fine-tuned to generate high-performance optimized code through interpretable, customized strategies rather than brute-force scaling. The system outperforms existing models on code performance benchmarks and can generate human-readable optimization feedback that further improves outcomes when paired with larger models.

🧠 GPT-5
AINeutralarXiv – CS AI · May 96/10
🧠

Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs

Researchers propose a top-down approach to automatic heuristic design for combinatorial optimization using large language models, where interpretable knowledge becomes the primary search object rather than executable code. This knowledge-first paradigm improves discovery efficiency and generalization across problems compared to traditional code-centric methods, suggesting future progress in AI-driven optimization depends on building reusable, explicit hypotheses.

AINeutralarXiv – CS AI · May 96/10
🧠

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

Researchers introduce MASPO, a framework that automatically optimizes prompts across multi-agent LLM systems by evaluating how well each agent's outputs enable downstream success rather than in isolation. The approach uses evolutionary beam search to navigate prompt spaces and achieves 2.9% average accuracy improvements over existing methods across six diverse tasks.

AIBullisharXiv – CS AI · May 96/10
🧠

Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio

Researchers present MoLS (Module-wise Learning Rate Scaling via SNR), a technique that automatically calibrates Adam optimizer updates across different modules in large language models by measuring signal-to-noise ratios. The method addresses optimization challenges caused by gradient heterogeneity across LLM components without requiring manual tuning, achieving performance comparable to hand-tuned approaches while maintaining compatibility with memory-efficient training.

AIBullisharXiv – CS AI · May 96/10
🧠

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

Researchers introduce UniSD, a unified self-distillation framework that systematically improves large language model adaptation without requiring external teacher models. The framework combines multiple complementary mechanisms and demonstrates consistent performance gains of +5.4 points over baseline models across six benchmarks, advancing efficient LLM training techniques.

AINeutralarXiv – CS AI · May 96/10
🧠

Owen-Shapley Policy Optimization: A Principled RL Algorithm for Generative Search LLMs

Researchers introduce Owen-Shapley Policy Optimization (OSPO), a reinforcement learning algorithm that improves how language models learn from feedback by attributing credit to individual tokens rather than treating entire sequences as atomic units. The method addresses a fundamental training gap in generative AI systems used for recommendation tasks, showing measurable improvements on real e-commerce datasets.

AINeutralarXiv – CS AI · May 76/10
🧠

When LLMs get significantly worse: A statistical approach to detect model degradations

Researchers propose a statistical framework using McNemar's test to reliably detect when large language model optimizations cause actual performance degradation versus noise. The method enables detection of even small accuracy drops (0.3%) while avoiding false alarms on theoretically lossless optimizations, with implementation provided for the LM Evaluation Harness.

AIBullisharXiv – CS AI · May 76/10
🧠

CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation

Researchers introduce CAR (Confidence-Aware Reranking), a training-free framework that improves document ranking in Retrieval-Augmented Generation systems by measuring how much each document increases the language model's confidence rather than just relevance. Testing across multiple datasets shows consistent improvements in ranking quality and downstream generation performance.

AIBullisharXiv – CS AI · May 76/10
🧠

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

CodeEvolve is an AI-driven evolutionary framework that automates code optimization by using LLMs, runtime profiling, and Monte Carlo Tree Search to identify and improve performance bottlenecks. The system achieves significant speedups (15.22x average) on enterprise Java codebases while maintaining functional correctness through rigorous validation pipelines.

AINeutralarXiv – CS AI · May 46/10
🧠

MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

MemRouter is a new memory management system for conversational AI agents that uses lightweight embedding-based routing instead of expensive LLM generation to decide which conversation turns to store. The approach achieves 52.0 F1 score versus 45.6 for LLM-based alternatives while reducing latency from 970ms to 58ms, suggesting memory admission can be effectively learned through supervised classification rather than generative models.

AINeutralarXiv – CS AI · May 16/10
🧠

Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations

Researchers propose VEROIC, a framework for optimizing inference costs in black-box LLM services by dynamically deciding when to allocate additional computation. The system uses partially observable reliability signals to balance response quality against computational expenses, achieving better cost-efficiency trade-offs than existing approaches.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

Researchers propose AdaRankLLM, an adaptive retrieval-augmented generation framework that dynamically filters irrelevant passages to reduce computational overhead while maintaining output quality. The study challenges whether adaptive retrieval remains necessary as language models grow more robust, finding that its value differs significantly between weaker and stronger models.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs

Researchers introduce wSSAS, a deterministic framework that enhances Large Language Model text categorization by combining hierarchical classification with signal-to-noise filtering to improve accuracy and reproducibility. Testing across Google Business, Amazon Product, and Goodreads reviews demonstrates significant improvements in clustering integrity and reduced categorization entropy.

🧠 Gemini
AINeutralarXiv – CS AI · Apr 156/10
🧠

Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads

Researchers present a systematic study of seven tactics for reducing cloud LLM token consumption in coding-agent workloads, demonstrating that local routing combined with prompt compression can achieve 45-79% token savings on certain tasks. The open-source implementation reveals that optimal cost-reduction strategies vary significantly by workload type, offering practical guidance for developers deploying AI coding agents at scale.

🏢 OpenAI
AINeutralarXiv – CS AI · Apr 146/10
🧠

Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

Researchers present the first systematic study of performance-energy trade-offs in multi-request LLM inference workflows, using NVIDIA A100 GPUs and vLLM/Parrot serving systems. The study identifies batch size as the most impactful optimization lever, though effectiveness varies by workload type, and reveals that workflow-aware scheduling can reduce energy consumption under power constraints.

🏢 Nvidia
AIBullisharXiv – CS AI · Apr 146/10
🧠

TInR: Exploring Tool-Internalized Reasoning in Large Language Models

Researchers propose Tool-Internalized Reasoning (TInR), a framework that embeds tool knowledge directly into Large Language Models rather than relying on external tool documentation during reasoning. The TInR-U model uses a three-phase training pipeline combining knowledge alignment, supervised fine-tuning, and reinforcement learning to improve reasoning efficiency and performance across various tasks.

AIBullisharXiv – CS AI · Apr 146/10
🧠

Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series

Researchers have optimized the Bielik v3 language models (7B and 11B parameters) by replacing universal tokenizers with Polish-specific vocabulary, addressing inefficiencies in morphological representation. This optimization reduces token fertility, lowers inference costs, and expands effective context windows while maintaining multilingual capabilities through advanced training techniques including supervised fine-tuning and reinforcement learning.

AIBullisharXiv – CS AI · Apr 146/10
🧠

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Researchers introduce MEDS, a memory-enhanced reward shaping framework that addresses a critical reinforcement learning failure mode where language models repeatedly generate similar errors. By tracking historical behavioral patterns and penalizing recurring mistake clusters, the method achieves consistent performance improvements across multiple datasets and models while increasing sampling diversity.

AIBullisharXiv – CS AI · Apr 146/10
🧠

Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

Researchers propose NExt, a nonlinear extrapolation framework that accelerates reinforcement learning with verifiable rewards (RLVR) for large language models by modeling low-rank parameter trajectories. The method reduces computational overhead by approximately 37.5% while remaining compatible with various RLVR algorithms, addressing a key bottleneck in scaling LLM training.

AIBullisharXiv – CS AI · Apr 146/10
🧠

An Iterative Utility Judgment Framework Inspired by Philosophical Relevance via LLMs

Researchers propose ITEM, an iterative utility judgment framework that enhances retrieval-augmented generation (RAG) systems by aligning with philosophical principles of relevance. The framework improves how large language models prioritize and process information from retrieval results, demonstrating measurable improvements across multiple benchmarks in ranking, utility assessment, and answer generation.

AIBullisharXiv – CS AI · Apr 146/10
🧠

Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights

Researchers demonstrate that quantization and local inference techniques can reduce LLM energy consumption and carbon emissions by up to 45% without sacrificing performance. The findings address growing sustainability concerns surrounding generative AI deployment, offering practical optimization strategies for resource-constrained environments.

AIBullisharXiv – CS AI · Apr 146/10
🧠

HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

Researchers introduce HiPRAG, a training methodology that improves agentic RAG systems by using fine-grained process rewards to optimize search decisions. The approach reduces inefficient search behaviors while achieving 65-67% accuracy across QA benchmarks, demonstrating that optimizing reasoning processes yields better performance than outcome-only training.

🧠 Llama
AINeutralarXiv – CS AI · Apr 136/10
🧠

StaRPO: Stability-Augmented Reinforcement Policy Optimization

Researchers propose StaRPO, a reinforcement learning framework that improves large language model reasoning by incorporating stability metrics alongside task rewards. The method uses Autocorrelation Function and Path Efficiency measurements to evaluate logical coherence and goal-directedness, demonstrating improved accuracy and reasoning consistency across four benchmarks.

AIBullisharXiv – CS AI · Apr 136/10
🧠

Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction

Researchers present PETITE, a tutor-student multi-agent framework that enhances LLM problem-solving by assigning complementary roles to agents from the same model. Evaluated on coding benchmarks, the approach achieves comparable or superior accuracy to existing methods while consuming significantly fewer tokens, demonstrating that structured role-differentiated interactions can improve LLM performance more efficiently than larger models or heterogeneous ensembles.

← PrevPage 4 of 5Next →