Power-Budgeted Underwater Vehicle Control via Constrained Reinforcement Learning
Researchers developed a constrained reinforcement learning approach for underwater vehicle control that explicitly budgets thruster power consumption, reducing energy use by 14-65% compared to traditional methods without requiring manual tuning for each vehicle or task.
This research addresses a critical engineering challenge in autonomous underwater systems: balancing mission objectives with energy constraints. Traditional reinforcement learning approaches optimize task completion but often produce oscillatory control patterns that waste power. While penalizing energy use in reward functions is a known workaround, it requires manual weight tuning that varies per vehicle and task, creating operational inefficiencies. The constrained Markov decision process formulation treats power consumption as an explicit budget constraint rather than a hidden penalty term, enabling automatic dual-variable optimization that adapts to different vehicles and scenarios without human intervention. The PPO-Lagrangian algorithm tested across twelve different configurations in the MarineGym simulator demonstrates consistent power reductions while maintaining task accuracy and smoother control outputs. This approach matters because underwater vehicle missions are fundamentally limited by onboard battery capacity—extending endurance directly expands operational range and mission viability. The tuning-free nature of the solution reduces deployment friction and expertise barriers, making energy-efficient control more accessible to practitioners. For robotics applications beyond underwater vehicles, this constrained optimization framework could generalize to any energy-limited autonomous system facing similar task-power trade-offs, from aerial drones to mobile robots. The research validates that treating physical constraints as formal constraints rather than soft objectives produces superior practical results while eliminating the hyperparameter search burden that plagues traditional approaches.
- →Constrained reinforcement learning reduces underwater vehicle thruster power consumption by 14-65% compared to unconstrained baselines.
- →Explicit power budgeting eliminates per-vehicle, per-task manual tuning required by traditional energy-penalty reward weighting.
- →PPO-Lagrangian algorithm automatically adapts a dual variable to meet specified power targets across different vehicles and missions.
- →The approach preserves task accuracy while producing smoother, more energy-efficient control outputs in most tested scenarios.
- →Framework generalizes beyond underwater vehicles to any autonomous system with energy constraints and multiple competing objectives.