AIBullisharXiv – CS AI · Apr 147/10
🧠
How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks
Researchers demonstrate that modern large language models can significantly improve code generation accuracy through iterative self-repair—feeding execution errors back to the model for correction—achieving 4.9-30.0 percentage point gains across benchmarks. The study reveals that instruction-tuned models succeed with prompting alone even at 8B scale, with Gemini 2.5 Flash reaching 96.3% pass rates on HumanEval, though logical errors remain substantially harder to fix than syntax errors.
🧠 Gemini🧠 Llama