AINeutralarXiv – CS AI · 9h ago6/10
🧠
Unsolvability Ceiling in Multi-LLM Routing: An Empirical Study of Evaluation Artifacts
A comprehensive empirical study reveals that reported inefficiencies in multi-LLM routing systems are substantially inflated by evaluation artifacts rather than genuine model limitations. Researchers found that LLM-as-a-judge biases, output truncation, and format mismatches account for a significant portion of measured failures, suggesting current routing cost-quality tradeoff estimates significantly overstate the actual unsolvability ceiling.
🧠 Llama