AINeutralarXiv – CS AI · 6h ago6/10
🧠
Threshold-Based Exclusive Batching for LLM Inference
Researchers demonstrate that exclusive batching (EB) can outperform the industry-standard mixed batching (MB) approach for LLM inference on bandwidth-constrained GPUs, with performance crossover dependent on hardware specifications and workload composition. A new hybrid scheduler (EB+) dynamically switches between strategies to optimize throughput across varying traffic conditions.