7 articles tagged with #contextual-bandits. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท Apr 67/10
๐ง Research examines how Large Language Models can be used to initialize contextual bandits for recommendation systems, finding that LLM-generated preferences remain effective up to 30% data corruption but can harm performance beyond 50% corruption. The study provides theoretical analysis showing when LLM warm-starts outperform cold-start approaches, with implications for AI-driven recommendation systems.
AIBullisharXiv โ CS AI ยท Feb 277/109
๐ง Researchers achieved breakthrough sample complexity improvements for offline reinforcement learning algorithms using f-divergence regularization, particularly for contextual bandits. The study demonstrates optimal O(ฮตโปยน) sample complexity under single-policy concentrability conditions, significantly improving upon existing bounds.
$NEAR
AINeutralarXiv โ CS AI ยท Feb 277/107
๐ง Researchers propose a new approach for training AI models to generate correct answers from demonstrations, using imitation learning in contextual bandits rather than traditional supervised fine-tuning. The method achieves better sample complexity and works with weaker assumptions about the underlying reward model compared to existing likelihood-maximization approaches.
AIBullisharXiv โ CS AI ยท Mar 55/10
๐ง Researchers developed a new variance-reduced EXP4-based algorithm for optimizing routing policies in multi-layer hierarchical inference systems. The solution addresses the challenge of sparse, policy-dependent feedback in AI systems where prediction errors are only revealed at terminal layers, improving stability and performance over standard importance-weighted approaches.
AIBearisharXiv โ CS AI ยท Mar 37/106
๐ง Researchers developed AdvBandit, a new black-box adversarial attack method that can exploit neural contextual bandits by poisoning context data without requiring access to internal model parameters. The attack uses bandit theory and inverse reinforcement learning to adaptively learn victim policies and optimize perturbations, achieving higher victim regret than existing methods.
AINeutralarXiv โ CS AI ยท Mar 174/10
๐ง Researchers propose CESA-LinUCB, a new approach to robust reinforcement learning that addresses 'Contextual Sycophancy' where evaluators are truthful in normal situations but biased in critical contexts. The method learns trust boundaries for each evaluator and achieves sublinear regret even when no evaluator is globally reliable.
AINeutralarXiv โ CS AI ยท Mar 94/10
๐ง A 4-week study comparing bandit algorithms and LLM architectures for personalized health behavior interventions found that LLM-based messaging approaches were rated more helpful than templates, but contextual bandit optimization provided no additional benefit over LLM-only methods. The research reveals a trade-off between structured exploration of behavior change techniques and generative flexibility in AI health systems.