AINeutralarXiv – CS AI · 14h ago6/10
🧠
Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk
Researchers present Nested Causal Thompson Sampling (NCTS), a machine learning framework for sequential decision-making where strategic choices causally influence subsequent tactical decisions across multiple timescales. The work introduces PAC-Bayesian risk bounds that enable off-policy certification of deployment policies from historical data alone, enabling safer handover from legacy systems to learned agents.