AINeutralarXiv – CS AI · 7h ago6/10
🧠
Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes
Researchers introduce QR-MAX, a model-based reinforcement learning algorithm designed for non-Markovian reward decision processes that depend on complete system history rather than current state alone. The algorithm provides formal PAC convergence guarantees with polynomial sample complexity, advancing a previously under-theorized area of RL with practical applications to temporal-dependency tasks.