AINeutralarXiv – CS AI · 6h ago6/10
🧠
A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs
Researchers present a novel harmonic mean formulation for average reward reinforcement learning in Semi-Markov decision processes (SMDPs), addressing a critical gap where existing algorithms fail under non-stationary reward and duration distributions. The new approach enables more robust model-free learning algorithms for infinite-horizon tasks where traditional reward-to-duration ratio optimization becomes mathematically incorrect.