y0news
AnalyticsDigestsSourcesRSSAICrypto
#qlora2 articles
2 articles
AIBullisharXiv โ€“ CS AI ยท 5d ago7/104
๐Ÿง 

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.

AIBullishHugging Face Blog ยท May 247/108
๐Ÿง 

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

The article discusses advances in making Large Language Models (LLMs) more accessible through bitsandbytes library, 4-bit quantization techniques, and QLoRA (Quantized Low-Rank Adaptation). These technologies enable running and fine-tuning large AI models on consumer hardware with significantly reduced memory requirements.