AIBullisharXiv – CS AI · 15h ago7/10
🧠
Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning
Researchers propose GraphGPO, a novel reinforcement learning method that improves credit assignment in agentic tasks by aggregating trajectories into a state-transition graph rather than relying on coarse-grained outcome-based attribution. This approach enables step-level credit recognition and achieves state-of-the-art performance on challenging benchmarks while significantly improving training efficiency.