#regret-bounds News & Analysis

10 articles tagged with #regret-bounds. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · Mar 177/10

🧠

EARCP: Self-Regulating Coherence-Aware Ensemble Architecture for Sequential Decision Making -- Ensemble Auto-Regule par Coherence et Performance

Researchers introduce EARCP, a new ensemble architecture for AI that dynamically weights different expert models based on performance and coherence. The system provides theoretical guarantees with sublinear regret bounds and has been tested on time series forecasting, activity recognition, and financial prediction tasks.

AINeutralarXiv – CS AI · Mar 46/103

🧠

What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

Researchers prove 'selection theorems' showing that AI agents achieving low regret on prediction tasks must develop internal predictive models and belief states. The work demonstrates that structured internal representations are mathematically necessary, not just helpful, for competent decision-making under uncertainty.

AINeutralarXiv – CS AI · Jun 95/10

🧠

Provably Efficient Personalized Multi-Objective Bandits with Proactive Conversational Queries

Researchers present MO-PQUCB, a novel algorithm for personalized multi-objective decision-making that combines conversational queries with bandit feedback to learn user preferences more efficiently. The method uses a Plackett-Luce choice model and shift-invariant regularization to overcome fundamental learning barriers, demonstrating improved regret scaling and robustness to corrupted preference signals compared to existing approaches.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Efficient Exploration for Iterative Nash Preference Optimization

Researchers propose an improved Nash Learning from Human Feedback (NLHF) algorithm that addresses exploration challenges in preference alignment for large language models. The new method achieves better regret bounds without exponential dependence on regularization parameters and demonstrates empirical improvements when fine-tuning Llama-3-8B.

🧠 Llama

AINeutralarXiv – CS AI · Jun 26/10

🧠

MINTS: Minimalist Thompson Sampling

Researchers introduce MINTS (Minimalist Thompson Sampling), a Bayesian framework that simplifies sequential decision-making under uncertainty by placing priors only on optimal parameters while eliminating unnecessary variables through profile likelihood. The approach achieves near-optimal regret bounds for multi-armed bandits and automatically adapts to structural constraints, matching classical performance benchmarks.

AINeutralarXiv – CS AI · May 125/10

🧠

Multi-Armed Bandits With Best-Action Queries

Researchers resolve an open problem in multi-armed bandit theory by characterizing how best-action oracle queries improve learning algorithms in the realistic bandit-feedback model. They prove that benefits depend critically on reward structure: correlated stochastic rewards cannot achieve the theoretical gains seen in full-feedback settings, while i.i.d. stochastic rewards maintain near-optimal improvements with logarithmic precision.

AINeutralarXiv – CS AI · May 126/10

🧠

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time

Researchers have resolved a longstanding open problem in robust dynamic pricing by developing a binary search variant that achieves decoupled regret bounds of O(C + log T) when corruption is known and O(C + log² T) when unknown, significantly improving upon the previous O(C log log T) bound from 2025.

AINeutralarXiv – CS AI · May 116/10

🧠

Towards Differentially Private Reinforcement Learning with General Function Approximation

Researchers present the first theoretical framework for differentially private reinforcement learning with general function approximation, achieving regret bounds of Õ(K^3/5) that match linear-case performance. This breakthrough extends privacy guarantees beyond tabular and linear settings, combining batched policy updates with the exponential mechanism for improved privacy-utility tradeoffs in online RL systems.

AINeutralarXiv – CS AI · May 116/10

🧠

Ensemble Distributionally Robust Bayesian Optimisation

Researchers propose a novel Ensemble Distributionally Robust Bayesian Optimisation algorithm that addresses context distributional uncertainty in zeroth-order optimization. The method achieves sublinear regret bounds while remaining computationally tractable, improving upon existing state-of-the-art approaches.

AINeutralarXiv – CS AI · Mar 34/104

🧠

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Researchers developed a new analysis of KL-regularized multi-armed bandits (MABs) using KL-UCB algorithm, achieving near-optimal regret bounds. The study provides the first high-probability regret bound with linear dependence on the number of arms and establishes matching lower bounds, offering comprehensive understanding across all regularization regimes.

$NEAR