AINeutralarXiv โ CS AI ยท 10h ago6/10
๐ง
StaRPO: Stability-Augmented Reinforcement Policy Optimization
Researchers propose StaRPO, a reinforcement learning framework that improves large language model reasoning by incorporating stability metrics alongside task rewards. The method uses Autocorrelation Function and Path Efficiency measurements to evaluate logical coherence and goal-directedness, demonstrating improved accuracy and reasoning consistency across four benchmarks.