Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
Researchers propose QGF (Q-Guided Flow), a reinforcement learning algorithm that optimizes policies entirely at test time using value gradients to guide pre-trained flow models, avoiding the training instability issues of traditional actor-critic approaches while maintaining competitive performance on offline RL benchmarks.