AINeutralarXiv – CS AI · 6h ago6/10
🧠
Entropy-Regularized Adjoint Matching for Offline RL
Researchers introduce Maximum Entropy Adjoint Matching (ME-AM), a new framework for offline reinforcement learning that combines flow-matching generative policies with entropy regularization to overcome limitations in existing Q-learning approaches. The method addresses popularity bias and support binding issues that prevent agents from discovering high-reward actions in low-density regions, demonstrating competitive performance across continuous control benchmarks.