AIBullisharXiv – CS AI · 8h ago7/10
🧠
APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention
Researchers introduce APB-V, a sequence-parallel framework that accelerates long-video inference in Large Multimodal Models by distributing approximate attention across multiple GPUs. The approach achieves 12.72x speedup over FlashAttn while processing longer videos without visual compression, addressing a critical bottleneck in AI video understanding.