AIBullisharXiv – CS AI · Mar 276/10
🧠
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
Researchers introduce TimeLens, a family of multimodal large language models optimized for video temporal grounding that outperforms existing open-source models and even surpasses proprietary models like GPT-5 and Gemini-2.5-Flash. The work addresses critical data quality issues in existing benchmarks and introduces improved training datasets and algorithmic design principles.
🧠 GPT-5🧠 Gemini