AIBullisharXiv – CS AI · 14h ago6/10
🧠
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
Researchers introduce VideoMLA, a novel approach that reduces KV cache memory requirements in video diffusion models by 92.7% through Multi-Head Latent Attention, enabling longer video generation with improved efficiency. The method challenges conventional assumptions about low-rank approximations in video models and demonstrates comparable quality to existing methods while improving throughput by 23%.