AINeutralarXiv โ CS AI ยท 10h ago6/10
๐ง
On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
Researchers introduce CoA-LoRA, a method that dynamically adapts LoRA fine-tuning to different quantization configurations without requiring separate retraining for each setting. The approach uses a configuration-aware model and Pareto-based search to optimize low-rank adjustments across heterogeneous edge devices, achieving comparable performance to traditional methods with zero additional computational cost.