AIBullisharXiv โ CS AI ยท 3h ago7/10
๐ง
SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters
SAGA is a new distributed GPU scheduler that treats entire AI agent workflows as atomic units rather than individual inference calls, reducing task completion time by 1.64x compared to existing solutions. The system achieves this through workflow-aware scheduling, KV cache optimization, and fairness mechanisms, though with a tradeoff of 30% lower peak throughput suitable for latency-sensitive interactive deployments.
๐ข Meta