AIBullisharXiv โ CS AI ยท 10h ago7/10
๐ง
Justitia: Fair and Efficient Scheduling of Task-parallel LLM Agents with Selective Pampering
Justitia is a new scheduling system for task-parallel LLM agents that optimizes GPU server performance through selective resource allocation based on completion order prediction. The system uses memory-centric cost quantification and virtual-time fair queuing to achieve both efficiency and fairness in LLM serving environments.
๐ข Meta