y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#contextual-bandits News & Analysis

7 articles tagged with #contextual-bandits. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AINeutralarXiv โ€“ CS AI ยท Apr 67/10
๐Ÿง 

Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits

Research examines how Large Language Models can be used to initialize contextual bandits for recommendation systems, finding that LLM-generated preferences remain effective up to 30% data corruption but can harm performance beyond 50% corruption. The study provides theoretical analysis showing when LLM warm-starts outperform cold-start approaches, with implications for AI-driven recommendation systems.

AIBullisharXiv โ€“ CS AI ยท Feb 277/109
๐Ÿง 

Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits

Researchers achieved breakthrough sample complexity improvements for offline reinforcement learning algorithms using f-divergence regularization, particularly for contextual bandits. The study demonstrates optimal O(ฮตโปยน) sample complexity under single-policy concentrability conditions, significantly improving upon existing bounds.

$NEAR
AINeutralarXiv โ€“ CS AI ยท Feb 277/107
๐Ÿง 

Learning to Answer from Correct Demonstrations

Researchers propose a new approach for training AI models to generate correct answers from demonstrations, using imitation learning in contextual bandits rather than traditional supervised fine-tuning. The method achieves better sample complexity and works with weaker assumptions about the underlying reward model compared to existing likelihood-maximization approaches.

AIBullisharXiv โ€“ CS AI ยท Mar 55/10
๐Ÿง 

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Researchers developed a new variance-reduced EXP4-based algorithm for optimizing routing policies in multi-layer hierarchical inference systems. The solution addresses the challenge of sparse, policy-dependent feedback in AI systems where prediction errors are only revealed at terminal layers, improving stability and performance over standard importance-weighted approaches.

AIBearisharXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Researchers developed AdvBandit, a new black-box adversarial attack method that can exploit neural contextual bandits by poisoning context data without requiring access to internal model parameters. The attack uses bandit theory and inverse reinforcement learning to adaptively learn victim policies and optimize perturbations, achieving higher victim regret than existing methods.

AINeutralarXiv โ€“ CS AI ยท Mar 174/10
๐Ÿง 

Learning When to Trust in Contextual Bandits

Researchers propose CESA-LinUCB, a new approach to robust reinforcement learning that addresses 'Contextual Sycophancy' where evaluators are truthful in normal situations but biased in critical contexts. The method learns trust boundaries for each evaluator and achieves sublinear regret even when no evaluator is globally reliable.

AINeutralarXiv โ€“ CS AI ยท Mar 94/10
๐Ÿง 

Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions

A 4-week study comparing bandit algorithms and LLM architectures for personalized health behavior interventions found that LLM-based messaging approaches were rated more helpful than templates, but contextual bandit optimization provided no additional benefit over LLM-only methods. The research reveals a trade-off between structured exploration of behavior change techniques and generative flexibility in AI health systems.