Don't Pause: Streaming Video-Language Synchrony for Online Video Understanding
Researchers introduce LyraV, a streaming video-language model that maintains real-time synchronization between video perception and language generation without pausing. The system uses a hierarchical control framework with two key components—a Frame-Driven Transition Controller and Streaming Token Pacer—to interleave video frames with generated tokens at 3.89 FPS with 98.29% synchrony.