AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง
OSCAR: Online Soft Compression And Reranking
Researchers introduce OSCAR, a new query-dependent online soft compression method for Retrieval-Augmented Generation (RAG) systems that reduces computational overhead while maintaining performance. The method achieves 2-5x speed improvements in inference with minimal accuracy loss across LLMs from 1B to 24B parameters.
๐ข Hugging Face