Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation
Researchers introduced Uno-Orchestra, a new orchestration framework for multi-agent LLM systems that dynamically decides when to decompose tasks and which model-primitive pairs to use, achieving 77% accuracy across 13 benchmarks while reducing computational costs by an order of magnitude compared to existing approaches.
Uno-Orchestra addresses a fundamental inefficiency in current LLM multi-agent systems: the inability to jointly optimize task decomposition depth, worker selection, and inference budgets. Traditional approaches rely on static routing strategies—either flat per-query assignments or manually engineered task breakdowns—that waste computational resources and leave performance on the table. This research demonstrates that learning unified routing policies through reinforcement learning grounded in real interaction data produces significantly better outcomes.
The technical achievement is substantial. By training on curated RL trajectories that reflect actual worker capabilities and interactions, Uno-Orchestra learns when tasks genuinely benefit from delegation versus when direct execution suffices. The 16% improvement over the strongest baseline while achieving roughly 10x cost reduction indicates the framework successfully navigates the accuracy-efficiency frontier that has constrained multi-agent LLM deployment. The breadth of evaluation—spanning math, code, knowledge retrieval, long-context reasoning, and tool-use scenarios—suggests the approach generalizes across diverse problem domains.
For the AI infrastructure space, this represents progress toward more economically viable agentic systems. Current multi-agent deployments often incur prohibitive inference costs that limit commercial viability. Uno-Orchestra's cost efficiency opens possibilities for deploying sophisticated agent systems at scale. The methodology of learning routing policies from real worker interactions also establishes a replicable pattern for future orchestration research, potentially influencing how companies build internal AI systems and optimize their model routing strategies. This work will likely inform product decisions around multi-model inference optimization and dynamic task routing in enterprise AI platforms.
- →Uno-Orchestra achieves 77% macro pass@1 accuracy, outperforming 22 baselines with roughly 10x lower per-query inference costs.
- →The framework learns joint optimization of task decomposition, worker selection, and inference budgets through RL training on curated trajectories.
- →Unified routing policies outperform rigid orchestration approaches by dynamically deciding when to decompose tasks versus execute directly.
- →Performance improvements span diverse domains including math, code, knowledge tasks, long-context reasoning, and tool-use scenarios.
- →The approach establishes a replicable methodology for training orchestration policies on real worker interaction data.