AIBullisharXiv – CS AI · 8h ago7/10
🧠
Zero-Shot Off-Policy Learning
Researchers present a novel off-policy learning method that addresses distributional shift and value overestimation in zero-shot reinforcement learning by establishing a theoretical connection between successor measures and stationary density ratios. The approach enables agents to adapt to new tasks without additional training by inferring optimal importance sampling ratios on-the-fly, with successful benchmarks across motion tracking, continuous control, and long-horizon tasks.