#multi-armed-bandits News & Analysis

5 articles tagged with #multi-armed-bandits. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · Mar 47/104

🧠

Learning Contextual Runtime Monitors for Safe AI-Based Autonomy

Researchers introduce a novel framework for learning context-aware runtime monitors for AI-based control systems in autonomous vehicles. The approach uses contextual multi-armed bandits to select the best controller for current conditions rather than averaging outputs, providing theoretical safety guarantees and improved performance in simulated driving scenarios.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Structured Neuron Pruning in Deep Neural Networks Using Multi-Armed Bandits

Researchers present a novel structured pruning framework that uses multi-armed bandit algorithms to remove redundant neurons from deep neural networks. The approach treats each neuron as a bandit arm, testing its importance through temporary masking and loss measurement, then applies various MAB policies (UCB1, Thompson Sampling, etc.) to identify which neurons to prune. Experiments across tabular and deep learning tasks show MAB-based pruning significantly outperforms traditional magnitude-based and greedy pruning methods.

AIBullisharXiv – CS AI · May 296/10

🧠

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

Researchers propose BaSE, a multi-armed bandit algorithm that optimizes how large language models allocate computational resources during evolutionary search tasks. By dynamically distributing LLM calls across parallel trajectories, BaSE improves mean fitness by 12.3% over existing baselines while addressing the reliability gap between reported best-case and typical run performance.

AINeutralarXiv – CS AI · May 125/10

🧠

Multi-Armed Bandits With Best-Action Queries

Researchers resolve an open problem in multi-armed bandit theory by characterizing how best-action oracle queries improve learning algorithms in the realistic bandit-feedback model. They prove that benefits depend critically on reward structure: correlated stochastic rewards cannot achieve the theoretical gains seen in full-feedback settings, while i.i.d. stochastic rewards maintain near-optimal improvements with logarithmic precision.

AINeutralarXiv – CS AI · Mar 34/104

🧠

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Researchers developed a new analysis of KL-regularized multi-armed bandits (MABs) using KL-UCB algorithm, achieving near-optimal regret bounds. The study provides the first high-probability regret bound with linear dependence on the number of arms and establishes matching lower bounds, offering comprehensive understanding across all regularization regimes.

$NEAR