AINeutralarXiv โ CS AI ยท 4h ago0
๐ง
Pessimistic Auxiliary Policy for Offline Reinforcement Learning
Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.