AINeutralarXiv – CS AI · 7h ago6/10
🧠
Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems
Researchers propose 'Markov decision contests' as a new reinforcement learning framework that leverages pairwise preferences instead of scalar rewards, proving that stationary Markov policies are optimal and demonstrating superior learning efficiency in long-horizon problems compared to existing methods.