AINeutralarXiv – CS AI · 9h ago6/10
🧠
Signal Reshaping for GRPO in Weak-Feedback Agentic Code Repair
Researchers present a signal-reshaping framework for GRPO (Group Relative Policy Optimization) that improves code-agent reinforcement learning under weak feedback conditions. The approach combines layered rewards, process-level credit assignment, and execution-aware rollout governance to increase strict compile-and-semantic accuracy from 38.5% to 53.5% on agentic code repair tasks.