BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
BoostAPR is a new AI framework that improves automated program repair by using dual reward models and reinforcement learning to identify which code edits actually fix bugs. The system achieves significant improvements on multiple benchmarks, including 40.7% on SWE-bench Verified, demonstrating that more granular feedback mechanisms can substantially enhance AI's ability to repair software vulnerabilities.
BoostAPR addresses a fundamental challenge in using reinforcement learning for code repair: the difficulty of identifying which specific edits contribute to fixing bugs when only end-to-end execution feedback is available. Traditional approaches suffer from sparse reward signals and coarse-grained assessments that leave the model uncertain about causality between changes and outcomes. The framework's innovation lies in its dual-model architecture, where line-level credit assignment operates at an intermediate granularity more natural to how developers think about code changes, while sequence-level assessment provides overall validation.
This research builds on the broader trend of applying machine learning to software engineering tasks, following earlier work on neural program repair and the creation of benchmarks like SWE-Gym. The progression from supervised learning to reinforcement learning with increasingly sophisticated reward structures reflects the field's maturation in handling the complexity of code generation at scale.
For the AI development community, BoostAPR's results are significant because they demonstrate that careful architectural choices in reward modeling can yield substantial improvements—a 22.9 percentage point gain over the baseline model is substantial. The cross-language transfer results (Python-to-Java) suggest the approach captures generalizable repair strategies rather than memorized patterns.
Looking ahead, the technique's applicability to other code generation tasks makes it relevant for developers building automated software maintenance systems. Subsequent work may explore whether similar dual-reward architectures benefit other structured generation problems beyond program repair, potentially influencing how reinforcement learning is applied to AI coding assistants.
- →BoostAPR uses dual reward models to provide line-level credit assignment for code repairs, achieving 40.7% on SWE-bench Verified
- →The framework combines supervised fine-tuning on execution-verified demonstrations with PPO optimization using granular feedback signals
- →Strong cross-language transfer results (24.8% on Defects4J Python-to-Java) indicate learned repair strategies generalize beyond training data
- →Line-level credit allocation at intermediate granularity proves more effective than sequence-level rewards alone for identifying critical edits
- →Results are competitive with open-source models while maintaining interpretability about which code regions drive successful repairs