AINeutralarXiv โ CS AI ยท 6h ago4
๐ง
SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud
Researchers tested distributed AI inference across device, edge, and cloud tiers in a 5G network, finding that sub-second AI response times required for embodied AI are challenging to achieve. On-device execution took multiple seconds, while RAN-edge deployment with quantized models could meet 0.5-second deadlines, and cloud deployment achieved 100% success for 1-second deadlines.
$NEAR