AIBullisharXiv – CS AI · May 117/10
🧠Researchers propose a gradient-based bilevel optimization method that automatically learns composite loss weights during pretraining by aligning gradients with downstream objectives. The approach reduces hyperparameter tuning overhead to ~30% above baseline training cost while matching or exceeding manually tuned baselines across event-sequence and computer vision tasks.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers have conducted a comprehensive ablation study of Tree-Structured Parzen Estimator (TPE), a widely-used Bayesian optimization method, to clarify the role of each control parameter and improve its empirical performance. The study provides actionable recommendations for parameter tuning in machine learning frameworks like Hyperopt and Optuna, with implementations now available through OptunaHub.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers demonstrate that batch size is a critical hyperparameter systematically overlooked in LoRA fine-tuning evaluations, causing conflicting performance claims across variants. A cost-efficient tuning strategy reveals batch size's substantial impact on optimal model performance, reconciling previous contradictory results and establishing clearer evaluation standards.
AIBullisharXiv – CS AI · May 296/10
🧠Researchers propose a Bayesian Optimization framework that uses pre-trained Large Language Models to efficiently search for optimal LoRA (Low-Rank Adaptation) hyperparameters by encoding domain knowledge as natural language prompts. The method discovers high-performing configurations in ~30 iterations versus 45,000 combinations, achieving 20% performance improvements while significantly reducing computational costs.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers have released LLMSYS-HPOBench, the first comprehensive benchmark suite for hyperparameter optimization in real-world LLM systems, containing 364,450 configurations across 932 settings with multiple fidelity factors and cost metrics. The dataset addresses gaps in existing AutoML benchmarks by capturing the unprecedented complexity of optimizing both AI and non-AI components in production language model systems.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce CDS4RAG, a novel optimization framework that improves Retrieval-Augmented Generation systems by cyclically optimizing retriever and generator hyperparameters separately rather than treating them as a monolithic unit. The method achieves up to 1.54x improvements in generation quality while demonstrating faster convergence across multiple benchmarks and language models.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers present HG-MS, a novel bilevel optimization method that handles cases where lower-level problems have multiple solutions along a manifold rather than a single optimum. The work provides theoretical guarantees for convergence while maintaining computational efficiency through pseudoinverse-based calculations, with practical applications demonstrated in LLM fine-tuning.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present a theoretical framework showing how mini-batch noise in Adam optimizer training affects the implicit bias toward sharper or flatter loss landscape regions, finding that optimal momentum hyperparameters shift based on batch size—small batches favor the default (0.9, 0.999) settings while larger batches benefit from closer β₁ and β₂ values.
AIBullisharXiv – CS AI · Apr 106/10
🧠Researchers propose PS-PFN, an advanced AutoML method that extends traditional algorithm selection and hyperparameter optimization to handle modern ML pipelines with fine-tuning and ensembling. Using posterior sampling and prior-data fitted networks for in-context learning, the approach outperforms existing bandit and AutoML strategies on benchmark tasks.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce AutoEP, a framework that uses Large Language Models (LLMs) as zero-shot reasoning engines to automatically configure algorithm hyperparameters without requiring training. The system combines real-time landscape analysis with multi-LLM reasoning to outperform existing methods and enables open-source models like Qwen3-30B to match GPT-4's performance in optimization tasks.
🧠 GPT-4