AINeutralarXiv – CS AI · 14h ago6/10
🧠
Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling
Researchers introduce Graph-Distance Contribution Reward (GDCR), a novel step-level credit assignment method for agentic search that evaluates individual agent actions by measuring progress toward answer nodes in knowledge graphs. Combined with Step Advantage Policy Optimization (SAPO), this approach improves upon trajectory-level reward systems that cannot assess the quality of intermediate steps, showing strong results across multiple benchmarks.