AIBullisharXiv – CS AI · 15h ago7/10
🧠
Trust Region Q Adjoint Matching
Researchers introduce Trust Region Q-Adjoint Matching (TRQAM), a reinforcement learning algorithm that stabilizes off-policy fine-tuning of pretrained flow policies by adaptively controlling deviation through trust-region constraints. The method demonstrates significant performance improvements, achieving 68% success rate on offline RL tasks compared to 46% for previous approaches.