AIBullisharXiv โ CS AI ยท 4h ago6/10
๐ง
OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
Researchers have developed OPRIDE, a new algorithm for offline preference-based reinforcement learning that significantly improves query efficiency. The algorithm addresses key challenges of inefficient exploration and overoptimization through principled exploration strategies and discount scheduling mechanisms.