AINeutralarXiv – CS AI · 6h ago6/10
🧠
AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
AdaGamma introduces a state-dependent discount factor method for deep reinforcement learning that learns to adjust discounting dynamically across different states, addressing instability issues in prior approaches through a return-consistency regularization objective. The method demonstrates empirical improvements when integrated into popular algorithms like SAC and PPO, with validated gains from real-world logistics deployment.