AIBullisharXiv – CS AI · 3h ago6/10
🧠
ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
Researchers introduce ProRL, a reinforcement learning framework designed to improve proactive recommender systems that guide users toward target items through sequential recommendations. The approach addresses fundamental gradient estimation problems in policy learning by implementing stepwise reward centering and position-specific advantage estimation, demonstrating superior performance on real-world datasets.