Self-Healing Agentic Orchestrators for Reliable Tool-Augmented Large Language Model Systems
Researchers present a self-healing orchestration framework for tool-augmented large language models that treats reliability as a bounded runtime control problem, achieving 98.8% task success by mapping failure signals to recovery actions and verifying results. The approach outperforms retry-only and full-replanning baselines across multiple benchmarks, particularly excelling when recovery budgets are constrained.