AINeutralarXiv โ CS AI ยท 5h ago1
๐ง
Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers
Researchers have developed new methods to understand how Video Diffusion Transformers convert motion-related text descriptions into video content. The study introduces GramCol and Interpretable Motion-Attentive Maps (IMAP) to spatially and temporally localize motion concepts in AI-generated videos without requiring gradient calculations.