#bandit-algorithms News & Analysis

6 articles tagged with #bandit-algorithms. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBearisharXiv – CS AI · Jun 47/10

🧠

Efficient Adversarial Attacks on High-dimensional Offline Bandits

Researchers demonstrate that offline bandit algorithms—used to evaluate machine learning models like image generators and LLMs—are vulnerable to adversarial attacks on their reward models. The study reveals that in high-dimensional settings, attackers can achieve near-perfect success rates with imperceptibly small perturbations to publicly available reward model weights, creating a critical security gap in AI evaluation systems.

🏢 Hugging Face

AINeutralarXiv – CS AI · Jun 95/10

🧠

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

Researchers introduce Dri-MED, a machine learning algorithm designed to handle multi-armed bandit problems with personalized user preferences, drifting context distributions, and baseline performance constraints. The algorithm achieves improved regret bounds while minimizing constraint violations, demonstrating practical advantages over conservative baseline approaches in experimental settings.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Should You Use Your Large Language Model to Explore or Exploit?

Researchers evaluated current large language models' effectiveness at solving exploration-exploitation tradeoffs in decision-making tasks. The study found that while reasoning models show promise for exploitation tasks, they remain impractical due to cost and speed constraints, and all tested LLMs underperform simple linear regression—though LLMs do excel at exploring large action spaces with semantic structure.

AINeutralarXiv – CS AI · Jun 26/10

🧠

MINTS: Minimalist Thompson Sampling

Researchers introduce MINTS (Minimalist Thompson Sampling), a Bayesian framework that simplifies sequential decision-making under uncertainty by placing priors only on optimal parameters while eliminating unnecessary variables through profile likelihood. The approach achieves near-optimal regret bounds for multi-armed bandits and automatically adapts to structural constraints, matching classical performance benchmarks.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors

Researchers prove that fixed-budget best-arm identification in bandit problems is no harder than fixed-confidence approaches up to logarithmic factors, introducing FC2FB—a meta-algorithm that converts fixed-confidence algorithms to fixed-budget ones while maintaining optimal sample complexity. This fundamental result establishes a previously unclear relationship between two core machine learning paradigms and enables improved algorithms across multiple problem classes.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Annealed Softmax Greedy in Many-Armed Bayesian Bandits

This paper analyzes why reinforcement learning methods that update policies based on reward signals without explicitly tracking uncertainty can still be effective. Researchers prove that annealed softmax policies achieve near-optimal regret rates in many-armed Bayesian bandit settings when many near-optimal actions exist, providing theoretical justification for uncertainty-agnostic approaches used in modern language model training.