y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#semantic-caching News & Analysis

3 articles tagged with #semantic-caching. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

From Exact Hits to Close Enough: Semantic Caching for LLM Embeddings

Researchers propose semantic caching solutions for large language models to improve response times and reduce costs by reusing semantically similar requests. The study proves that optimal offline semantic caching is NP-hard and introduces polynomial-time heuristics and online policies combining recency, frequency, and locality factors.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Researchers introduce Krites, an asynchronous caching system for Large Language Models that uses LLM judges to verify cached responses, improving efficiency without changing serving decisions. The system increases the fraction of requests served with curated static answers by up to 3.9 times while maintaining unchanged critical path latency.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1017
๐Ÿง 

Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

Researchers have developed Higress-RAG, a new enterprise-grade framework that addresses key challenges in Retrieval-Augmented Generation systems including low retrieval precision, hallucination, and high latency. The system introduces innovations like 50ms semantic caching, hybrid retrieval methods, and corrective evaluation to optimize the entire RAG pipeline for production use.

$LINK