AIBullisharXiv โ CS AI ยท 1d ago6/10
๐ง
APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs
Researchers propose APreQEL, an adaptive mixed precision quantization method for deploying large language models on edge devices. The approach optimizes memory, latency, and accuracy by applying different quantization levels to different layers based on their importance and hardware characteristics.