AIBullisharXiv – CS AI · Mar 56/10
🧠
OSCAR: Online Soft Compression And Reranking
Researchers introduce OSCAR, a new query-dependent online soft compression method for Retrieval-Augmented Generation (RAG) systems that reduces computational overhead while maintaining performance. The method achieves 2-5x speed improvements in inference with minimal accuracy loss across LLMs from 1B to 24B parameters.
🏢 Hugging Face