AINeutralarXiv – CS AI · 10h ago6/10
🧠
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic
Researchers propose a marginalized reparameterization (MRP) estimator to enable practical use of mixture policies in reinforcement learning, addressing a long-standing gap between theoretical potential and practical implementation. By reducing variance compared to likelihood-ratio methods, MRP mixture policies achieve performance parity with standard Gaussian policies while offering greater flexibility in continuous action spaces.
🏢 Google