AINeutralarXiv โ CS AI ยท 8h ago4/10
๐ง
Learning When to Trust in Contextual Bandits
Researchers propose CESA-LinUCB, a new approach to robust reinforcement learning that addresses 'Contextual Sycophancy' where evaluators are truthful in normal situations but biased in critical contexts. The method learns trust boundaries for each evaluator and achieves sublinear regret even when no evaluator is globally reliable.