AINeutralarXiv – CS AI · 11h ago5/10
🧠
UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning
Researchers introduce UBP2, a model-based reinforcement learning method that improves sample efficiency in preference-based learning by actively directing exploration through uncertainty quantification across reward, dynamics, and value functions. The approach achieves sublinear regret guarantees and demonstrates substantially higher sample efficiency than existing methods on benchmark tasks.