#caching News & Analysis

5 articles tagged with #caching. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · Mar 37/104

🧠

BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching

Researchers have developed BWCache, a training-free method that accelerates Diffusion Transformer (DiT) video generation by up to 6× through block-wise feature caching and reuse. The technique exploits computational redundancy in DiT blocks across timesteps while maintaining visual quality, addressing a key bottleneck in real-world AI video generation applications.

AIBullishOpenAI News · Jan 227/107

🧠

Scaling PostgreSQL to power 800 million ChatGPT users

OpenAI successfully scaled PostgreSQL to handle millions of queries per second to support 800 million ChatGPT users. The scaling was achieved through strategic implementation of database replicas, caching systems, rate limiting mechanisms, and workload isolation techniques.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Researchers introduce Krites, an asynchronous caching system for Large Language Models that uses LLM judges to verify cached responses, improving efficiency without changing serving decisions. The system increases the fraction of requests served with curated static answers by up to 3.9 times while maintaining unchanged critical path latency.

AIBullisharXiv – CS AI · Mar 36/103

🧠

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

MeanCache introduces a training-free caching framework that accelerates Flow Matching inference by using average velocities instead of instantaneous ones. The framework achieves 3.59X to 4.56X acceleration on major AI models like FLUX.1, Qwen-Image, and HunyuanVideo while maintaining superior generation quality compared to existing caching methods.

GeneralNeutralGoogle Research Blog · Jun 255/10

📰

Optimizing cloud economics with linear elastic caching

This article discusses linear elastic caching techniques for optimizing cloud computing costs and performance. The piece examines algorithmic approaches to cache management that dynamically scale resources based on demand, reducing infrastructure expenses while maintaining system efficiency.