AIBullisharXiv โ CS AI ยท Feb 276/104
๐ง
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
Researchers have developed Hierarchy-of-Groups Policy Optimization (HGPO), a new reinforcement learning method that improves AI agents' performance on long-horizon tasks by addressing context inconsistency issues in stepwise advantage estimation. The method shows significant improvements over existing approaches when tested on challenging agentic tasks using Qwen2.5 models.