A Regime Theory of Controller Class Selection for LLM Action Decisions
Researchers propose a regime theory framework for selecting controller classes in language and vision-language models, determining whether AI systems should answer directly, retrieve evidence, defer to stronger models, or abstain. The work demonstrates that model expressivity doesn't uniformly improve performance in finite samples, and provides a principled method to match controller complexity to data availability across multiple benchmarks.
This research addresses a fundamental operational challenge in deploying large language models: deciding when a model should handle a query independently versus when it should route the request elsewhere or decline to answer. Rather than assuming that more sophisticated routing mechanisms always outperform simpler ones, the authors establish that optimal controller selection depends on finite-sample constraints and the reliability of uncertainty signals.
The theoretical contribution centers on organizing controllers into a nested lattice of increasing complexity and proving distribution-dependent thresholds that determine which class performs best. This reflects a broader tension in machine learning between model capacity and data efficiency. When uncertainty signals are unreliable due to insufficient samples, simpler partition routers may actually outperform instance-level controllers despite their apparent sophistication. The Bernstein-tight bounds provide both theoretical rigor and practical guarantees.
For AI system designers and practitioners, this framework offers concrete guidance on resource allocation and architecture selection. Rather than defaulting to the most complex routing mechanism, developers can estimate three data-dependent bottlenecks and systematically select appropriate controller classes. The empirical validation across diverse benchmarks—SMS-Spam, HallusionBench, A-OKVQA, FOLIO, and TextVQA—demonstrates that theoretical predictions align with observed performance patterns.
The availability of open-source code enables broader adoption and testing. This work will likely influence how teams design inference systems balancing accuracy, latency, and computational cost. Future development should focus on real-time estimation of these bottlenecks in production environments and extending the framework to multi-stage routing decisions.
- →More expressive controller classes don't uniformly improve performance; optimal selection depends on finite-sample availability and signal reliability.
- →A nested lattice framework with three estimable bottlenecks enables principled controller class selection matching theoretical predictions to empirical results.
- →Partition routers can outperform instance-level controllers when uncertainty signals are exhausted at scale, despite their apparent simplicity.
- →The Bernstein-tight thresholds provide both theoretical guarantees and information-theoretic lower bounds for controller selection.
- →Validated across five benchmarks with open-source implementation, enabling practical deployment in AI routing systems.