🧠 AI⚪ NeutralImportance 6/10

Why Retrying Fails: Context Contamination in LLM Agent Pipelines

arXiv – CS AI|Zhanfu Yang|May 12, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce the Context-Contaminated Restart Model (CCRM) to formally analyze why LLM agents fail at higher rates when retrying tasks after errors, showing that failed attempts pollute the context window and increase subsequent error rates 7.1x. The model provides closed-form formulas for success probability, optimal pipeline depth allocation, and quantifies the exact benefit of clearing context before retry attempts.

Analysis

This research addresses a critical but previously unquantified problem in LLM agent systems: context contamination degrades performance during retries. When agents attempt multi-step tool-use tasks and fail, the contaminated context from the failed attempt remains visible during the next try, systematically elevating error rates beyond baseline levels. The CCRM framework provides mathematical rigor to what practitioners have observed empirically, offering five core theoretical contributions including exact success probability formulas and optimal budget allocation strategies.

The work builds on growing recognition that LLM agents, despite impressive capabilities, struggle with reliability in multi-step reasoning. Previous research assumed independent error rates across attempts (IID model), but real-world validation on SWE-bench data reveals this assumption dramatically overestimates performance—predicting 98.6% pass@3 success versus the actual 81.2%. This 17.4 percentage point gap highlights how context contamination creates a cascading penalty where epsilon_1/epsilon_0 ratios reach 7.1x.

For developers and AI systems engineers, these findings directly inform deployment decisions. The optimal pipeline depth formula T* = sqrt(B * log(1/(1-epsilon_1)) / log(1/(1-epsilon_0))) enables engineers to balance pipeline depth against retry budgets rather than blindly increasing attempts. The dominance theorem quantifying clean-restart benefits suggests architectural changes where context is explicitly cleared between retry attempts could yield substantial improvements. The information-theoretic lower bounds prove CCRM characterizes the problem space optimally, not leaving room for dramatically better strategies.

Future work likely extends to adaptive retry strategies, context filtering techniques, and agent architectures explicitly designed to mitigate contamination. This foundational analysis enables principled optimization of agent reliability systems.

Key Takeaways

→LLM agents experience 7.1x higher error rates during retries due to contaminated context from previous failed attempts, not independent failures
→Existing IID performance models overestimate real-world success rates by 17+ percentage points on code generation tasks
→Optimal pipeline depth follows a closed-form solution balancing task complexity against retry budgets, enabling principled architecture design
→Clearing context before retry attempts provides quantifiable benefits that can inform system optimization strategies
→The CCRM framework is mathematically tight and validated empirically, providing reliable predictions for multi-step agent performance