AINeutralarXiv – CS AI · 15h ago6/10
🧠
Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
Researchers propose a novel game-theoretic approach to weakly-supervised video temporal grounding that models video frames and query words as cooperative game players to improve moment localization. The method addresses limitations in existing contrastive learning approaches by enabling fine-grained cross-modal interaction without relying on complex moment proposals, demonstrating superior performance on benchmark datasets.