AINeutralarXiv – CS AI · Apr 146/10
🧠
ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving
ConfigSpec introduces a profiling-based framework for optimizing distributed LLM inference across edge-cloud systems using speculative decoding. The research reveals that no single configuration can simultaneously optimize throughput, cost efficiency, and energy efficiency—requiring dynamic, device-aware configuration selection rather than fixed deployments.