AIBullisharXiv โ CS AI ยท 10h ago6/10
๐ง
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
Researchers introduce TimeLens, a family of multimodal large language models optimized for video temporal grounding that outperforms existing open-source models and even surpasses proprietary models like GPT-5 and Gemini-2.5-Flash. The work addresses critical data quality issues in existing benchmarks and introduces improved training datasets and algorithmic design principles.
๐ง GPT-5๐ง Gemini