AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce EARCP, a new ensemble architecture for AI that dynamically weights different expert models based on performance and coherence. The system provides theoretical guarantees with sublinear regret bounds and has been tested on time series forecasting, activity recognition, and financial prediction tasks.
AINeutralarXiv – CS AI · Mar 46/103
🧠Researchers prove 'selection theorems' showing that AI agents achieving low regret on prediction tasks must develop internal predictive models and belief states. The work demonstrates that structured internal representations are mathematically necessary, not just helpful, for competent decision-making under uncertainty.
AINeutralarXiv – CS AI · 1d ago6/10
🧠Researchers propose an improved Nash Learning from Human Feedback (NLHF) algorithm that addresses exploration challenges in preference alignment for large language models. The new method achieves better regret bounds without exponential dependence on regularization parameters and demonstrates empirical improvements when fine-tuning Llama-3-8B.
🧠 Llama
AINeutralarXiv – CS AI · 1d ago6/10
🧠Researchers introduce MINTS (Minimalist Thompson Sampling), a Bayesian framework that simplifies sequential decision-making under uncertainty by placing priors only on optimal parameters while eliminating unnecessary variables through profile likelihood. The approach achieves near-optimal regret bounds for multi-armed bandits and automatically adapts to structural constraints, matching classical performance benchmarks.
AINeutralarXiv – CS AI · May 125/10
🧠Researchers resolve an open problem in multi-armed bandit theory by characterizing how best-action oracle queries improve learning algorithms in the realistic bandit-feedback model. They prove that benefits depend critically on reward structure: correlated stochastic rewards cannot achieve the theoretical gains seen in full-feedback settings, while i.i.d. stochastic rewards maintain near-optimal improvements with logarithmic precision.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers have resolved a longstanding open problem in robust dynamic pricing by developing a binary search variant that achieves decoupled regret bounds of O(C + log T) when corruption is known and O(C + log² T) when unknown, significantly improving upon the previous O(C log log T) bound from 2025.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present the first theoretical framework for differentially private reinforcement learning with general function approximation, achieving regret bounds of Õ(K^3/5) that match linear-case performance. This breakthrough extends privacy guarantees beyond tabular and linear settings, combining batched policy updates with the exponential mechanism for improved privacy-utility tradeoffs in online RL systems.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose a novel Ensemble Distributionally Robust Bayesian Optimisation algorithm that addresses context distributional uncertainty in zeroth-order optimization. The method achieves sublinear regret bounds while remaining computationally tractable, improving upon existing state-of-the-art approaches.
AINeutralarXiv – CS AI · Mar 34/104
🧠Researchers developed a new analysis of KL-regularized multi-armed bandits (MABs) using KL-UCB algorithm, achieving near-optimal regret bounds. The study provides the first high-probability regret bound with linear dependence on the number of arms and establishes matching lower bounds, offering comprehensive understanding across all regularization regimes.
$NEAR