AINeutralarXiv – CS AI · 18h ago6/10
🧠
Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis
Researchers analyzed whether pretrained video foundation models encode intuitive physics understanding by probing three model types (V-JEPA, VideoMAE, and LTX-Video) across frozen representations. Results show physics knowledge emerges reliably in intermediate-to-late layers, with V-JEPA performing strongest and temporal information proving critical for understanding physical dynamics.