Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents
Researchers present PROBE, a framework that improves how AI software engineering agents recover from failures by converting runtime telemetry into structured diagnoses and bounded recovery guidance. The system achieves 65% diagnosis accuracy and 21.8% recovery rates on previously unresolved cases, with a prototype deployed at Microsoft showing practical viability without disrupting existing workflows.