AINeutralarXiv – CS AI · 6h ago6/10
🧠
Baseline-Free Policy Optimization for Neural Combinatorial Optimization
Researchers propose Group Relative Policy Optimization (GRPO), a baseline-free training algorithm for neural combinatorial optimization that eliminates the need for maintaining frozen policy copies. Testing on TSP and CVRP benchmarks shows GRPO prevents training collapse seen in standard REINFORCE while achieving competitive solution quality, offering a more stable alternative for routing problem optimization.