#temporal-grounding News & Analysis

7 articles tagged with #temporal-grounding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

MOSS-Audio Technical Report

MOSS-Audio is a unified audio-language model supporting speech, environmental sound, and music understanding with capabilities in captioning, question answering, and temporal grounding. The model introduces DeepStack cross-layer feature injection and time markers for explicit temporal cues, released in 4B and 8B variants for instruction-following and reasoning tasks.

AINeutralarXiv – CS AI · Jun 126/10

🧠

TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generation

Researchers introduce TrajGenAgent, an LLM-based framework that generates realistic synthetic human mobility trajectories without model fine-tuning by combining hierarchical agent design with deterministic workflows. The approach addresses privacy and cost constraints in trajectory data collection while maintaining semantic coherence and behavioral realism.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition

Researchers introduce ExtremeWhenBench, a benchmark for temporal grounding in hour-long videos using natural language queries. The study reveals that video-language models fail dramatically on long-form content because search—not recognition—is the bottleneck, with a hybrid retrieve-then-ground approach recovering 6.7x performance over monolithic models.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Towards One-to-Many Temporal Grounding

Researchers introduce One-to-Many Temporal Grounding (OMTG), a new AI task for localizing multiple video segments matching a single text query. They establish the first OMTG benchmark with 56k samples and novel evaluation metrics, achieving 43.65% performance—outperforming advanced models like Gemini 2.5 Pro by 15.85%.

🧠 Gemini

AINeutralarXiv – CS AI · May 296/10

🧠

MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs

Researchers introduce MusTBENCH, a benchmark for evaluating temporal grounding capabilities in Large Audio-Language Models (LALMs) for music understanding, and propose MusT, an optimization framework that significantly improves model performance on time-sensitive musical tasks like instrument entries and rhythmic transitions.

AINeutralarXiv – CS AI · May 276/10

🧠

Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective

Researchers propose a novel game-theoretic approach to weakly-supervised video temporal grounding that models video frames and query words as cooperative game players to improve moment localization. The method addresses limitations in existing contrastive learning approaches by enabling fine-grained cross-modal interaction without relying on complex moment proposals, demonstrating superior performance on benchmark datasets.

AIBullisharXiv – CS AI · Mar 276/10

🧠

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Researchers introduce TimeLens, a family of multimodal large language models optimized for video temporal grounding that outperforms existing open-source models and even surpasses proprietary models like GPT-5 and Gemini-2.5-Flash. The work addresses critical data quality issues in existing benchmarks and introduces improved training datasets and algorithmic design principles.

🧠 GPT-5🧠 Gemini