y0news
#pessimistic-policy1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 4h ago0
๐Ÿง 

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.