#hyperparameter-tuning News & Analysis

12 articles tagged with #hyperparameter-tuning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AIBullisharXiv – CS AI · May 117/10

🧠

When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

Researchers propose a gradient-based bilevel optimization method that automatically learns composite loss weights during pretraining by aligning gradients with downstream objectives. The approach reduces hyperparameter tuning overhead to ~30% above baseline training cost while matching or exceeding manually tuned baselines across event-sequence and computer vision tasks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression

Researchers propose KORE (Kolmogorov-optimal Order-aware Resolution Estimation), a method that solves for optimal hyperparameters in spline regression analytically rather than through expensive grid search. The approach reduces computational cost by ~8x while matching exhaustive cross-validation performance across high-dimensional datasets.

AINeutralarXiv – CS AI · Jun 196/10

🧠

A Multi-Agent system for Multi-Objective constrained optimization

Researchers introduce MAMO, a multi-agent reinforcement learning system that autonomously optimizes reward weight selection for constrained optimization problems in dynamic environments. This addresses a critical limitation in current RL approaches where manual tuning of penalty weights significantly impacts policy performance and constraint adherence.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance

Researchers have conducted a comprehensive ablation study of Tree-Structured Parzen Estimator (TPE), a widely-used Bayesian optimization method, to clarify the role of each control parameter and improve its empirical performance. The study provides actionable recommendations for parameter tuning in machine learning frameworks like Hyperopt and Optuna, with implementations now available through OptunaHub.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA

Researchers demonstrate that batch size is a critical hyperparameter systematically overlooked in LoRA fine-tuning evaluations, causing conflicting performance claims across variants. A cost-efficient tuning strategy reveals batch size's substantial impact on optimal model performance, reconciling previous contradictory results and establishing clearer evaluation standards.

AIBullisharXiv – CS AI · May 296/10

🧠

A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search

Researchers propose a Bayesian Optimization framework that uses pre-trained Large Language Models to efficiently search for optimal LoRA (Low-Rank Adaptation) hyperparameters by encoding domain knowledge as natural language prompts. The method discovers high-performing configurations in ~30 iterations versus 45,000 combinations, achieving 20% performance improvements while significantly reducing computational costs.

AINeutralarXiv – CS AI · May 126/10

🧠

LLMSYS-HPOBench: Hyperparameter Optimization Benchmark Suite for Real-World LLM Systems

Researchers have released LLMSYS-HPOBench, the first comprehensive benchmark suite for hyperparameter optimization in real-world LLM systems, containing 364,450 configurations across 932 settings with multiple fidelity factors and cost metrics. The dataset addresses gaps in existing AutoML benchmarks by capturing the unprecedented complexity of optimizing both AI and non-AI components in production language model systems.

AINeutralarXiv – CS AI · May 126/10

🧠

CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

Researchers introduce CDS4RAG, a novel optimization framework that improves Retrieval-Augmented Generation systems by cyclically optimizing retriever and generator hyperparameters separately rather than treating them as a monolithic unit. The method achieves up to 1.54x improvements in generation quality while demonstrating faster convergence across multiple benchmarks and language models.

AINeutralarXiv – CS AI · May 126/10

🧠

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

Researchers present HG-MS, a novel bilevel optimization method that handles cases where lower-level problems have multiple solutions along a manifold rather than a single optimum. The work provides theoretical guarantees for convergence while maintaining computational efficiency through pseudoinverse-based calculations, with practical applications demonstrated in LLM fine-tuning.

AINeutralarXiv – CS AI · May 116/10

🧠

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

Researchers present a theoretical framework showing how mini-batch noise in Adam optimizer training affects the implicit bias toward sharper or flatter loss landscape regions, finding that optimal momentum hyperparameters shift based on batch size—small batches favor the default (0.9, 0.999) settings while larger batches benefit from closer β₁ and β₂ values.

AIBullisharXiv – CS AI · Apr 106/10

🧠

In-Context Decision Making for Optimizing Complex AutoML Pipelines

Researchers propose PS-PFN, an advanced AutoML method that extends traditional algorithm selection and hyperparameter optimization to handle modern ML pipelines with fine-tuning and ensembling. Using posterior sampling and prior-data fitted networks for in-context learning, the approach outperforms existing bandit and AutoML strategies on benchmark tasks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms

Researchers introduce AutoEP, a framework that uses Large Language Models (LLMs) as zero-shot reasoning engines to automatically configure algorithm hyperparameters without requiring training. The system combines real-time landscape analysis with multi-LLM reasoning to outperform existing methods and enables open-source models like Qwen3-30B to match GPT-4's performance in optimization tasks.

🧠 GPT-4