AINeutralarXiv – CS AI · 6h ago6/10
🧠
Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets
Researchers propose Bellman-Taylor score decoding, a novel deep reinforcement learning framework designed to handle Markov decision processes with state-dependent action constraints common in operations research. The method decouples policy learning into a Euclidean score space while maintaining feasibility through an action decoder, enabling standard DRL algorithms to optimize complex systems like queueing networks without architectural modifications.