AINeutralarXiv – CS AI · 8h ago6/10
🧠
Human-Less LLM Serving: Quantifying the Human Tax on Throughput
Researchers quantify a significant efficiency cost in LLM serving systems: meeting latency targets (TTFT and TPOT) designed for human users reduces throughput by 60-93% for AI workloads that don't require human-perceptible latency. The study demonstrates that one-size-fits-all SLA configurations waste substantial computational resources when applied to programmatic AI-to-AI tasks.