AINeutralarXiv – CS AI · 5h ago6/10
🧠
MotionEnhancer: Leveraging Video Diffusion for Motion-Enhanced Vision-Language Models
Researchers introduce MotionEnhancer, a novel technique that combines Video Diffusion Models with Vision-Language Models to improve fine-grained motion understanding in video analysis. The parameter-free approach uses attention alignment to extract motion priors without requiring additional training or architectural modifications, achieving consistent improvements on motion-understanding benchmarks.