#hyperparameter-optimization News & Analysis

18 articles tagged with #hyperparameter-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

18 articles

AIBullisharXiv – CS AI · Apr 157/10

🧠

AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow

AutoSurrogate is an LLM-driven framework that automates the construction of deep learning surrogate models for subsurface flow simulation, enabling domain scientists without machine learning expertise to build high-quality models through natural language instructions. The system autonomously handles data profiling, architecture selection, hyperparameter optimization, and quality assessment while managing failure modes, demonstrating superior performance to expert-designed baselines on geological carbon storage tasks.

AINeutralarXiv – CS AI · Jun 256/10

🧠

ASAP: Agent-System Co-Design for Wall-Clock-Centered Auto HPO Research for ML Experiments

Researchers introduce ASAP, an agent-system co-design that leverages LLMs to coordinate multiple hyperparameter optimization tools while reducing wall-clock execution time through architectural innovations like KV-cache reuse and speculation parallelism. The approach addresses fundamental limitations in current LLM-based HPO methods by treating the language model as an orchestrator rather than a replacement tool, demonstrating consistent performance improvements across diverse ML tasks.

AIBearisharXiv – CS AI · Jun 236/10

🧠

When Is an LLM Worth It for Hyperparameter Optimization? A Budget-Matched Study on Tabular Data Finds the Warm-Start Is a Default Configuration, Not the Model

A rigorous empirical study challenges claims that large language models improve hyperparameter optimization for tabular data, finding that LLM advisors' apparent advantage comes entirely from a fixed default configuration seed, not the model itself. Classical search methods with the same seed match or outperform LLM approaches within a handful of evaluations, suggesting LLM-based HPO systems offer no meaningful generalization benefit.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Gradient-Descent Steps to Success over Mean Accuracy: A Paradigm Shift for ML

Researchers propose evaluating machine learning models based on computational effort (gradient descent steps to reach target accuracy) rather than maximum accuracy alone. The study reveals that larger learning rates, phase transitions in training strategy, and restart-based approaches optimize both generalization and computational efficiency, offering a new framework for AutoML and model selection.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Small Experiments, Cheaper Decisions: A Case Study in Staged Promotion for Micro-Pretraining

Researchers present a staged-promotion protocol for efficiently screening machine learning configurations during micro-pretraining, using fixed budget increments across heterogeneous hardware to reduce experimental costs while mitigating the risk of selecting configurations that perform well only at tiny scales. The study demonstrates that early-stage rankings are unstable across hardware types, but a frozen promotion rule successfully identified a consistent top performer while reducing total GPU-hours from 432 to 169.2.

AIBullisharXiv – CS AI · Jun 106/10

🧠

Importance-Aware Scheduling for High-Dimensional Hyperparameter Optimization

Researchers propose Greedy Importance First (GIF), a novel hyperparameter optimization strategy that uses importance-based scheduling to improve efficiency in high-dimensional ML/DL model training. The method outperforms established optimizers like TPE and BOHB on high-dimensional benchmarks by focusing computational resources on the most impactful hyperparameters.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces

Researchers propose conditional PED-ANOVA (condPED-ANOVA), a new framework for measuring hyperparameter importance in machine learning search spaces where parameters have conditional dependencies. The method addresses limitations of existing approaches by accurately handling cases where a hyperparameter's presence or domain depends on other hyperparameters, improving the reliability of AutoML systems.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Rethinking Evaluation Paradigms in IBP-based Certified Training

Researchers propose a new evaluation framework for certified neural network training methods using Pareto front comparisons to assess the natural-certified accuracy trade-off. By applying automated hyperparameter optimization across methods, they reveal significant undertuning in prior work and establish new performance benchmarks that challenge assumptions about state-of-the-art certified robustness.

🏢 Meta

AINeutralarXiv – CS AI · Jun 26/10

🧠

Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition

Researchers propose Self-Adaptive Monotonic Normalization (SAMN), a hyperparameter-friendly approach to improve long-tailed recognition in deep learning. The method eliminates the need for manual parameter tuning while achieving state-of-the-art performance by enforcing monotonic constraints on per-class weight norms during classifier retraining.

AINeutralarXiv – CS AI · Jun 26/10

🧠

c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

Researchers propose c-TPE, an enhanced Bayesian optimization method that extends the Tree-structured Parzen Estimator to handle inequality constraints in hyperparameter optimization. The method addresses practical real-world limitations like memory and latency constraints while maintaining strong performance, demonstrating superiority over existing approaches across 81 expensive optimization problems.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Weight Decay Improves Language Model Plasticity

Researchers demonstrate that weight decay during language model pretraining significantly improves model plasticity—the ability to adapt to downstream tasks through fine-tuning. The study reveals counterintuitive findings where higher weight decay produces weaker base models but stronger performance after task-specific training, challenging conventional approaches to hyperparameter optimization.

AINeutralarXiv – CS AI · May 296/10

🧠

RAISE: RAG Design as an Architecture Search Problem

Researchers introduce RAISE, a comprehensive framework for optimizing retrieval-augmented generation (RAG) systems by treating architecture design as a hyperparameter search problem. The study evaluates 13 optimization algorithms across seven datasets, revealing that RAG performance is highly task-dependent and no single optimization strategy universally outperforms others, highlighting the need for systematic rather than heuristic-based configuration approaches.

🏢 Meta

AINeutralarXiv – CS AI · May 285/10

🧠

Improving Evaluation of Recombination-based Cartesian Genetic Programming

Researchers demonstrate that recombination-based operators in Cartesian Genetic Programming can achieve competitive performance when combined with proper hyperparameter optimization, challenging the long-held assumption that mutation-only approaches are superior for symbolic regression tasks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice

Researchers demonstrate that small-scale proxy models commonly used by AI companies to evaluate data curation strategies produce unreliable conclusions because optimal training configurations are data-dependent. They propose using reduced learning rates in proxy model training as a simple, cost-effective solution that better predicts full-scale model performance across diverse data recipes.

🏢 Meta

AINeutralarXiv – CS AI · Apr 136/10

🧠

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

Researchers systematically evaluated how sampling temperature and prompting strategies affect extended reasoning performance in large language models, finding that zero-shot prompting peaks at moderate temperatures (T=0.4-0.7) while chain-of-thought performs better at extremes. The study reveals that extended reasoning benefits grow substantially with higher temperatures, suggesting that T=0 is suboptimal for reasoning tasks.

🧠 Grok

AIBullisharXiv – CS AI · Mar 36/103

🧠

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport

Researchers introduce Hyperparameter Trajectory Inference (HTI), a method to predict how neural networks behave with different hyperparameter settings without expensive retraining. The approach uses conditional Lagrangian optimal transport to create surrogate models that approximate neural network outputs across various hyperparameter configurations.

AINeutralarXiv – CS AI · Mar 115/10

🧠

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

Researchers introduce the Overfitting-Underfitting Indicator (OUI) to analyze learning rate sensitivity in PPO reinforcement learning systems. The metric can identify problematic learning rates early in training by measuring neural activation patterns, enabling more efficient hyperparameter screening without full training runs.

AINeutralHugging Face Blog · Nov 24/106

🧠

Hyperparameter Search with Transformers and Ray Tune

The article discusses hyperparameter optimization techniques for transformer models using Ray Tune, a distributed hyperparameter tuning library. This approach enables efficient scaling of machine learning model training and optimization across multiple computing resources.