AINeutralarXiv – CS AI · 7h ago6/10
🧠
Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions
Researchers introduce CDQAC, an offline reinforcement learning algorithm that learns effective job scheduling policies from static, suboptimal datasets rather than requiring extensive online training interactions. The breakthrough demonstrates that scheduling performance depends primarily on state-action coverage rather than trajectory quality, enabling the algorithm to learn effectively from even simple random heuristics while requiring only 1-5% of original dataset size.