AIBullisharXiv – CS AI · 2h ago7/10
🧠
A Policy-Driven Runtime Layer for Agentic LLM Serving
Researchers propose a new runtime layer architecture for serving multi-agent LLM systems, positioned between application frameworks and inference engines. The approach enables unified policy management for cross-cutting concerns like caching and fairness, with CacheSage demonstrating 13-37% improvements in cache hit rates and 12-29% reductions in time-to-first-token latency.