AINeutralarXiv – CS AI · Mar 24/106
🧠
Pessimistic Auxiliary Policy for Offline Reinforcement Learning
Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.