AINeutralarXiv – CS AI · 7h ago6/10
🧠
UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling
UniScale introduces a unified framework that combines model routing and test-time scaling to optimize large language model inference, balancing quality and computational cost. The system uses online learning via contextual multi-armed bandits to adapt inference policies dynamically, achieving fine-grained performance improvements over existing decoupled approaches.