y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-optimization News & Analysis

45 articles tagged with #llm-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

45 articles
AIBullisharXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights

Researchers demonstrate that quantization and local inference techniques can reduce LLM energy consumption and carbon emissions by up to 45% without sacrificing performance. The findings address growing sustainability concerns surrounding generative AI deployment, offering practical optimization strategies for resource-constrained environments.

AIBullisharXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

Researchers introduce HiPRAG, a training methodology that improves agentic RAG systems by using fine-grained process rewards to optimize search decisions. The approach reduces inefficient search behaviors while achieving 65-67% accuracy across QA benchmarks, demonstrating that optimizing reasoning processes yields better performance than outcome-only training.

๐Ÿง  Llama
AINeutralarXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

StaRPO: Stability-Augmented Reinforcement Policy Optimization

Researchers propose StaRPO, a reinforcement learning framework that improves large language model reasoning by incorporating stability metrics alongside task rewards. The method uses Autocorrelation Function and Path Efficiency measurements to evaluate logical coherence and goal-directedness, demonstrating improved accuracy and reasoning consistency across four benchmarks.

AIBullisharXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction

Researchers present PETITE, a tutor-student multi-agent framework that enhances LLM problem-solving by assigning complementary roles to agents from the same model. Evaluated on coding benchmarks, the approach achieves comparable or superior accuracy to existing methods while consuming significantly fewer tokens, demonstrating that structured role-differentiated interactions can improve LLM performance more efficiently than larger models or heterogeneous ensembles.

AINeutralarXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

Beyond Relevance: Utility-Centric Retrieval in the LLM Era

A research paper proposes a fundamental shift in how retrieval systems are evaluated, moving from traditional relevance-based metrics toward utility-centric optimization for large language models. This framework argues that retrieval effectiveness should be measured by its contribution to LLM-generated answer quality rather than document ranking alone, reflecting the structural changes introduced by retrieval-augmented generation (RAG) systems.

AIBullisharXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

Researchers introduce RecaLLM, a post-trained language model that addresses the 'lost-in-thought' phenomenon where retrieval performance degrades during extended reasoning chains. The model interleaves explicit in-context retrieval with reasoning steps and achieves strong performance on long-context benchmarks using training data significantly shorter than existing approaches.

AIBullisharXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search

Researchers introduce Chain-in-Tree (CiT), a framework that optimizes large language model tree search by selectively branching only when necessary rather than at every step. The approach reduces computational overhead by 75-85% on math reasoning tasks with minimal accuracy loss, making inference-time scaling more practical for resource-constrained deployments.

AIBullisharXiv โ€“ CS AI ยท 6d ago6/10
๐Ÿง 

Rectifying LLM Thought from Lens of Optimization

Researchers introduce RePro, a novel post-training technique that optimizes large language models' reasoning processes by framing chain-of-thought as gradient descent and using process-level rewards to reduce overthinking. The method demonstrates consistent performance improvements across mathematics, science, and coding benchmarks while mitigating inefficient reasoning behaviors in LLMs.

AINeutralarXiv โ€“ CS AI ยท Apr 76/10
๐Ÿง 

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

Research reveals that adaptive reward mechanisms in AI-guided satellite scheduling systems actually hurt performance, with static reward weights achieving 342.1 Mbps versus dynamic weights at only 103.3 Mbps. The study found that fine-tuned LLMs performed poorly due to weight oscillation issues, while simpler MLP models achieved superior results of 357.9 Mbps.

AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

Researchers have developed EcoThink, an energy-aware AI framework that reduces inference energy consumption by 40.4% on average while maintaining performance. The system uses adaptive routing to skip unnecessary computation for simple queries while preserving deep reasoning for complex tasks, addressing sustainability concerns in large language model deployment.

AIBullisharXiv โ€“ CS AI ยท Mar 266/10
๐Ÿง 

APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs

Researchers propose APreQEL, an adaptive mixed precision quantization method for deploying large language models on edge devices. The approach optimizes memory, latency, and accuracy by applying different quantization levels to different layers based on their importance and hardware characteristics.

AINeutralarXiv โ€“ CS AI ยท Mar 55/10
๐Ÿง 

Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

Researchers present a blueprint for evaluating and optimizing multi-agent conversational shopping assistants, addressing challenges in multi-turn interactions and tightly coupled AI systems. The paper introduces evaluation rubrics and two prompt-optimization strategies including a novel Multi-Agent Multi-Turn GEPA approach for system-level optimization.

AIBullisharXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

Researchers propose Draft-Thinking, a new approach to improve the efficiency of large language models' reasoning processes by reducing unnecessary computational overhead. The method achieves an 82.6% reduction in reasoning budget with only a 2.6% performance drop on mathematical problems, addressing the costly overthinking problem in current chain-of-thought reasoning.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment

Researchers introduce LittleBit-2, a new framework for extreme compression of large language models that achieves sub-1-bit quantization while maintaining performance comparable to 1-bit baselines. The method uses Internal Latent Rotation and Joint Iterative Quantization to solve geometric alignment issues in binary quantization, establishing new state-of-the-art results on Llama-2 and Llama-3 models.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Researchers introduce AdaBlock-dLLM, a training-free optimization technique for diffusion-based large language models that adaptively adjusts block sizes during inference based on semantic structure. The method addresses limitations in conventional fixed-block semi-autoregressive decoding, achieving up to 5.3% accuracy improvements under the same throughput budget.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

Researchers evaluated HiFloat (HiF8 and HiF4) formats for low-bit inference on Ascend NPUs, finding them superior to integer formats for high-variance data and preventing accuracy collapse in 4-bit regimes. The study demonstrates HiFloat's compatibility with existing quantization frameworks and its potential for efficient large language model inference.

AIBullisharXiv โ€“ CS AI ยท Mar 27/1018
๐Ÿง 

Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling

Researchers propose Semantic Parallelism, a new framework called Sem-MoE that significantly improves efficiency of large language model inference by optimizing how AI models distribute computational tasks across multiple devices. The system reduces communication overhead between devices by 'collocating' frequently-used model components with their corresponding data, achieving superior throughput compared to existing solutions.

AIBullishHugging Face Blog ยท Apr 166/107
๐Ÿง 

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

The article discusses prefill and decode techniques for optimizing Large Language Model (LLM) performance when handling concurrent requests. These methods aim to improve efficiency and reduce latency in AI systems serving multiple users simultaneously.

AINeutralarXiv โ€“ CS AI ยท 6d ago5/10
๐Ÿง 

Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search

Researchers introduce MSPA-CQR, a machine learning approach that improves conversational query rewriting by aligning preferences across three dimensions: query rewriting, passage retrieval, and response generation. The method uses self-consistent preference data and direct preference optimization to generate more diverse and effective rewritten queries in conversational search systems.

AIBullishHugging Face Blog ยท Mar 285/107
๐Ÿง 

Fast Inference on Large Language Models: BLOOMZ on Habana Gaudi2 Accelerator

The article discusses optimizing BLOOMZ, a large language model, for fast inference on Intel's Habana Gaudi2 accelerator hardware. This technical development focuses on improving AI model performance and efficiency through specialized hardware acceleration.

โ† PrevPage 2 of 2