AINeutralarXiv – CS AI · 18h ago6/10
🧠
Reinforcement Learning for Flow-Matching Policies with Density Transport
Researchers present RLDT, a reinforcement learning algorithm that fine-tunes flow-matching policies by treating policy improvement as density transport toward high-reward regions. The method addresses limitations in existing approaches by preserving multimodal modeling capacity while using Stein Variational Gradient Descent and expected-target estimation to stabilize training across continuous-control tasks.