ExecTune: Effective Steering of Black-Box LLMs with Guide Models
Researchers introduce ExecTune, a training methodology for optimizing black-box LLM systems where a guide model generates strategies executed by a core model. The approach improves accuracy by up to 9.2% while reducing inference costs by 22.4%, enabling smaller models like Claude Haiku to match larger competitors at significantly lower computational expense.
ExecTune addresses a critical economic reality in deployed LLM systems: inference costs often dwarf training expenses, making efficiency optimization essential for sustainable scaling. The research formalizes Guide-Core Policies (GCoP)—systems where a lightweight guide model generates structured strategies for execution by a black-box core—and identifies why existing approaches generate brittle, inefficient outputs. By analyzing guide-averaged executability, the authors demonstrate that current methods fail to account for deployment constraints, leading to wasted computation and suboptimal performance.
The methodology builds on established techniques (teacher-guided sampling, supervised fine-tuning, reinforcement learning) but applies them specifically to optimize syntactic validity and execution success rather than raw accuracy. This shift in optimization targets directly addresses the executability bottleneck. The benchmarking results validate the approach: Claude Haiku 3.5 achieving parity with or exceeding Sonnet 3.5 on mathematical reasoning and code generation represents substantial progress in cost-efficient inference.
For AI infrastructure stakeholders, this work has immediate implications. Organizations running expensive LLM APIs can reduce operational costs while maintaining quality through better-composed agent architectures. The modular adaptation capability—updating guides without retraining cores—enables continuous improvement without expensive redeployment cycles. The 38% cost reduction versus Sonnet 4 while maintaining near-parity accuracy suggests that future AI applications may increasingly rely on orchestrated multi-model systems rather than monolithic large models.
- →ExecTune optimizes guide-model strategies for reliable execution by black-box LLMs, improving accuracy 9.2% while cutting costs 22.4%
- →Claude Haiku 3.5 now matches Sonnet 3.5 performance on math and code tasks, achieving Sonnet 4 quality at 38% lower inference cost
- →Guide-averaged executability emerges as the key performance bottleneck in composed LLM systems, previously overlooked by existing methods
- →Modular guide adaptation enables continuous improvement without retraining core models, supporting efficient system evolution
- →Cost-sensitive optimization directly addresses the economics of black-box API deployment where recurring inference costs exceed training expenses