AIBullisharXiv – CS AI · 6h ago7/10
🧠
Reinforcement Learning Foundation Models Should Already Be A Thing
Researchers propose that reinforcement learning foundation models should be developed using synthetic MDPs (Markov Decision Processes) as training data, similar to how TabPFN uses synthetic data for tabular prediction. A Graph Attention Network trained entirely on synthetic MDPs demonstrates strong performance on both online and offline RL benchmarks without task-specific tuning, suggesting this approach is viable.