#llm-routing News & Analysis

8 articles tagged with #llm-routing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AINeutralarXiv – CS AI · Jun 56/10

🧠

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

Researchers introduce MetaRouter, a meta-learning framework that optimizes Large Language Model routing by learning individual users' implicit cost-performance preferences through minimal interaction. The system enables personalized query routing across multiple models, balancing expense reduction with performance maintenance more effectively than existing methods.

AINeutralarXiv – CS AI · Jun 26/10

🧠

RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

Researchers introduce RASER, a cost-efficient routing system for multi-hop question-answering that reduces token consumption by 51-59% compared to always-escalating methods while maintaining competitive accuracy. The system leverages six features from one-shot retrieval to intelligently decide whether additional retrieval rounds are necessary, eliminating wasteful LLM calls.

AIBullisharXiv – CS AI · Jun 16/10

🧠

OrcaRouter: A Production-Oriented LLM Router with Hybrid Offline-Online Learning

OrcaRouter is a production-ready LLM routing system that uses contextual bandits and hybrid offline-online learning to intelligently direct requests to the most appropriate language model. The system ranked second on the RouterArena leaderboard with 75.54% accuracy while maintaining low inference costs of $1.00 per 1,000 queries.

AINeutralarXiv – CS AI · May 126/10

🧠

Route by State, Recover from Trace: STAR with Failure-Aware Markov Routing for Multi-Agent Spatiotemporal Reasoning

Researchers present STAR, a failure-aware routing framework for multi-agent AI systems that handles spatiotemporal reasoning tasks by intelligently routing between specialist agents based on typed failure states rather than generic success/failure signals. The system learns recovery transitions from execution traces and demonstrates improved performance across multiple benchmarks, suggesting that explicit failure-aware routing is more effective than implicit language-based decision-making in complex reasoning tasks.

AINeutralarXiv – CS AI · May 116/10

🧠

Unsolvability Ceiling in Multi-LLM Routing: An Empirical Study of Evaluation Artifacts

A comprehensive empirical study reveals that reported inefficiencies in multi-LLM routing systems are substantially inflated by evaluation artifacts rather than genuine model limitations. Researchers found that LLM-as-a-judge biases, output truncation, and format mismatches account for a significant portion of measured failures, suggesting current routing cost-quality tradeoff estimates significantly overstate the actual unsolvability ceiling.

🧠 Llama

AINeutralarXiv – CS AI · May 96/10

🧠

A Regime Theory of Controller Class Selection for LLM Action Decisions

Researchers propose a regime theory framework for selecting controller classes in language and vision-language models, determining whether AI systems should answer directly, retrieve evidence, defer to stronger models, or abstain. The work demonstrates that model expressivity doesn't uniformly improve performance in finite samples, and provides a principled method to match controller complexity to data availability across multiple benchmarks.

AIBullisharXiv – CS AI · Apr 206/10

🧠

Privacy-Preserving LLMs Routing

Researchers propose PPRoute, a privacy-preserving framework for LLM routing that uses Secure Multi-Party Computation (MPC) to protect user data while dynamically selecting between model providers. The system achieves 20x speedup over naive MPC implementations through optimized encoder inference, multi-step model training, and an efficient Top-k algorithm, maintaining routing quality without sacrificing privacy.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

Researchers propose AMRO-S, a new routing framework for multi-agent LLM systems that uses ant colony optimization to improve efficiency and reduce costs. The system addresses key deployment challenges like high inference costs and latency while maintaining performance quality through semantic-aware routing and interpretable decision-making.