AINeutralarXiv – CS AI · 8h ago6/10
🧠
Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration
Researchers introduce Model-Driven Policy Optimization (MDPO), a framework that enhances gradient-based optimization in differentiable simulators by incorporating adaptive stochastic exploration. The method dynamically adjusts noise injection based on gradient sensitivity, enabling better navigation of complex optimization landscapes and outperforming both deterministic planning and model-free reinforcement learning approaches on nonlinear benchmark tasks.