y0news
AnalyticsDigestsSourcesRSSAICrypto
#rom1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 5d ago7/104
๐Ÿง 

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.