AIBullisharXiv – CS AI · 3h ago6/10
🧠
VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer
VidPrism introduces a heterogeneous Mixture-of-Experts framework that enhances Vision-Language Models for video understanding by deploying specialized experts rather than identical generalists. The approach uses dynamic multi-rate sampling and bidirectional fusion to achieve state-of-the-art performance on video recognition benchmarks.