AIBullisharXiv – CS AI · 9h ago7/10
🧠
Regulating Branch Parallelism in LLM Serving
Researchers introduce TAPER, an admission controller for managing parallel branch execution in LLM serving systems. The system dynamically regulates how many concurrent decoding branches are allowed per request step, balancing throughput gains against degradation to co-batched requests, achieving 1.77x improvement in goodput over conservative baselines.