AIBullisharXiv โ CS AI ยท 5d ago7/104
๐ง
ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM
Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.