AINeutralarXiv – CS AI · 10h ago5/10
🧠
Multi-Armed Bandits With Best-Action Queries
Researchers resolve an open problem in multi-armed bandit theory by characterizing how best-action oracle queries improve learning algorithms in the realistic bandit-feedback model. They prove that benefits depend critically on reward structure: correlated stochastic rewards cannot achieve the theoretical gains seen in full-feedback settings, while i.i.d. stochastic rewards maintain near-optimal improvements with logarithmic precision.