AINeutralarXiv – CS AI · 6h ago6/10
🧠
Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods
Researchers present PAVE, a theoretical and practical framework addressing policy instability in actor-critic reinforcement learning by stabilizing the critic's Q-function gradient field rather than directly regularizing policy outputs. The work demonstrates that policy smoothness is fundamentally determined by the critic's differential geometry, offering a more principled approach to deploying learned policies in physical systems.