y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#contextual-bandits News & Analysis

17 articles tagged with #contextual-bandits. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles
AINeutralarXiv – CS AI · Apr 67/10
🧠

Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits

Research examines how Large Language Models can be used to initialize contextual bandits for recommendation systems, finding that LLM-generated preferences remain effective up to 30% data corruption but can harm performance beyond 50% corruption. The study provides theoretical analysis showing when LLM warm-starts outperform cold-start approaches, with implications for AI-driven recommendation systems.

AINeutralarXiv – CS AI · Feb 277/107
🧠

Learning to Answer from Correct Demonstrations

Researchers propose a new approach for training AI models to generate correct answers from demonstrations, using imitation learning in contextual bandits rather than traditional supervised fine-tuning. The method achieves better sample complexity and works with weaker assumptions about the underlying reward model compared to existing likelihood-maximization approaches.

AINeutralarXiv – CS AI · 4d ago5/10
🧠

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

Researchers introduce Dri-MED, a machine learning algorithm designed to handle multi-armed bandit problems with personalized user preferences, drifting context distributions, and baseline performance constraints. The algorithm achieves improved regret bounds while minimizing constraint violations, demonstrating practical advantages over conservative baseline approaches in experimental settings.

AINeutralarXiv – CS AI · 5d ago6/10
🧠

Online Pandora's Box for Contextual LLM Cascading

Researchers propose an online contextual Pandora's Box model for optimizing LLM API cascading, where decision-makers sequentially query multiple APIs and select outputs based on indirect reward feedback. The approach achieves theoretically optimal regret bounds without requiring full distribution estimation, advancing practical optimization strategies for multi-API LLM systems.

$MKR
AINeutralarXiv – CS AI · Jun 56/10
🧠

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

Researchers introduce MetaRouter, a meta-learning framework that optimizes Large Language Model routing by learning individual users' implicit cost-performance preferences through minimal interaction. The system enables personalized query routing across multiple models, balancing expense reduction with performance maintenance more effectively than existing methods.

AIBullisharXiv – CS AI · Jun 16/10
🧠

OrcaRouter: A Production-Oriented LLM Router with Hybrid Offline-Online Learning

OrcaRouter is a production-ready LLM routing system that uses contextual bandits and hybrid offline-online learning to intelligently direct requests to the most appropriate language model. The system ranked second on the RouterArena leaderboard with 75.54% accuracy while maintaining low inference costs of $1.00 per 1,000 queries.

AINeutralarXiv – CS AI · May 296/10
🧠

Learning to Choose: An Empowerment-Guided Multi-Agent System with semantic communication for Adaptive Method Selection

Researchers introduce a multi-agent framework that combines contextual bandits with semantic checkpoints to prevent 'semantic drift' in automated scientific computing workflows. The system ensures that computational strategies selected by AI agents are faithfully executed and remain causally attributable throughout multi-agent pipelines, improving convergence and robustness in adaptive decision-making.

AINeutralarXiv – CS AI · May 296/10
🧠

The Sample Complexity of Multiclass and Sparse Contextual Bandits

Researchers present optimal algorithms for sparse contextual bandits that achieve sample complexity of Õ((s/ε² + |A|/ε)log|Π|/δ), closing a gap from prior work that had exponential dependence on action set size. The results apply to multiclass classification and combinatorial semi-bandits through information-theoretic and algorithmic approaches.

AINeutralarXiv – CS AI · May 285/10
🧠

Learning to Assign Prediction Tasks to Agents with Capacity Constraints

Researchers propose a machine learning framework for optimally assigning prediction tasks to heterogeneous agents (humans or AI systems) subject to capacity constraints. The work develops explore-exploit algorithms that learn agent expertise and adapt assignments dynamically, demonstrating improvements over baseline approaches across tabular, image, and text tasks.

AINeutralarXiv – CS AI · May 276/10
🧠

Linear and Neural Dueling Bandits with Delayed Feedback

Researchers propose novel algorithms (LDB-DF and NDB-DF) for contextual dueling bandits that handle delayed feedback—a critical real-world constraint in recommender systems and LLM alignment. The breakthrough involves an Inverse Probability Weighting mechanism that eliminates bias from delayed observations, achieving theoretical regret bounds of O(d√T) for linear settings.

AINeutralarXiv – CS AI · May 16/10
🧠

Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents

Researchers introduce RSCB-MC, a risk-sensitive contextual bandit system that improves how LLM-based coding agents decide whether to use external memory for debugging tasks. Rather than treating memory retrieval as a simple similarity-matching problem, the system treats it as a safety-critical control problem, achieving 62.5% success rate with zero false positives in testing.

AIBullisharXiv – CS AI · Mar 55/10
🧠

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Researchers developed a new variance-reduced EXP4-based algorithm for optimizing routing policies in multi-layer hierarchical inference systems. The solution addresses the challenge of sparse, policy-dependent feedback in AI systems where prediction errors are only revealed at terminal layers, improving stability and performance over standard importance-weighted approaches.

AIBearisharXiv – CS AI · Mar 37/106
🧠

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Researchers developed AdvBandit, a new black-box adversarial attack method that can exploit neural contextual bandits by poisoning context data without requiring access to internal model parameters. The attack uses bandit theory and inverse reinforcement learning to adaptively learn victim policies and optimize perturbations, achieving higher victim regret than existing methods.

AINeutralarXiv – CS AI · Mar 174/10
🧠

Learning When to Trust in Contextual Bandits

Researchers propose CESA-LinUCB, a new approach to robust reinforcement learning that addresses 'Contextual Sycophancy' where evaluators are truthful in normal situations but biased in critical contexts. The method learns trust boundaries for each evaluator and achieves sublinear regret even when no evaluator is globally reliable.

AINeutralarXiv – CS AI · Mar 94/10
🧠

Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions

A 4-week study comparing bandit algorithms and LLM architectures for personalized health behavior interventions found that LLM-based messaging approaches were rated more helpful than templates, but contextual bandit optimization provided no additional benefit over LLM-only methods. The research reveals a trade-off between structured exploration of behavior change techniques and generative flexibility in AI health systems.