🧠 AI⚪ NeutralImportance 6/10

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

arXiv – CS AI|Wenzhi Fang, Liangqi Yuan, Guangchen Lan, Dong-Jun Han, Christopher G. Brinton|May 12, 2026 at 04:00 AM

🤖AI Summary

Researchers propose a critique-and-routing controller for multi-agent LLM systems that iteratively refines outputs through sequential decision-making rather than one-shot routing. The method uses reinforcement learning with agent-utilization constraints to achieve performance approaching the strongest agent while reducing computational calls by over 75%, advancing coordination efficiency in heterogeneous AI systems.

Analysis

This research addresses a fundamental limitation in current multi-agent LLM architectures: static routing decisions that prevent iterative improvement. Traditional controllers select a single model and return its output immediately, missing opportunities for refinement through critique and rerouting. The proposed critique-and-routing controller reformulates coordination as a sequential decision problem, allowing the system to evaluate intermediate drafts, determine whether to continue refining, and select the optimal next agent for further work.

The technical approach formulates this challenge as a finite-horizon Markov Decision Process with explicit constraints on agent utilization—a practical consideration that prevents overuse of expensive or specialized models. By optimizing controller decisions via policy gradients under a Lagrangian-relaxed objective, the authors enable adaptive routing that balances quality with computational efficiency. This builds on recent trends in agentic AI systems that move beyond simple routing toward dynamic orchestration.

The empirical results demonstrate substantial practical value. Across seven reasoning benchmarks and multiple heterogeneous model configurations, the method consistently outperforms existing baselines while using any single agent for fewer than 25% of total inference calls. This efficiency gain matters significantly for production deployments where inference costs directly impact operational expenses and latency constraints.

The work reflects growing recognition that LLM system design requires intelligent coordination mechanisms beyond prompt engineering. As organizations deploy increasingly complex multi-model architectures—combining specialized models for different reasoning types, modalities, or domains—controllers that enable iterative refinement become critical infrastructure. Future research likely extends these techniques to handle dynamic agent pools, heterogeneous cost structures, and real-time quality metrics.

Key Takeaways

→Critique-and-routing controllers enable iterative refinement in multi-agent LLM systems instead of static one-shot routing decisions.
→Formulating agent coordination as an MDP with utilization constraints optimizes both output quality and computational efficiency.
→The method achieves near-strongest-agent performance while reducing per-agent call frequency to under 25% of total inference.
→Policy gradient optimization under Lagrangian relaxation provides a scalable framework for balancing multiple competing objectives.
→Results across seven benchmarks demonstrate consistent improvements over existing multi-agent coordination baselines.

#multi-agent-llm #reinforcement-learning #model-routing #inference-efficiency #markov-decision-process #agent-coordination #language-models

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AI5d ago

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AI5d ago

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AI6d ago

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge