AIBullisharXiv โ CS AI ยท 10h ago5/10
๐ง
Speech Recognition on TV Series with Video-guided Post-ASR Correction
Researchers have developed a Video-Guided Post-ASR Correction (VPC) framework that uses Video-Large Multimodal Models to improve speech recognition accuracy in complex environments like TV series. The system addresses challenges with multiple speakers, overlapping speech, and domain-specific terminology by leveraging video context to refine ASR outputs.