y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#speaker-diarization News & Analysis

4 articles tagged with #speaker-diarization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv โ€“ CS AI ยท Feb 275/103
๐Ÿง 

Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment

Researchers developed Lipi-Ghor-882, an 882-hour Bengali speech dataset, and demonstrated that targeted fine-tuning with synthetic acoustic degradation significantly improves automatic speech recognition for long-form Bengali audio. Their dual pipeline achieved a 0.019 Real-Time Factor, establishing new benchmarks for low-resource speech processing.

AIBullisharXiv โ€“ CS AI ยท Mar 44/102
๐Ÿง 

An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization

Researchers developed a multistage AI approach for Bengali speech transcription and speaker diarization, achieving significant improvements in processing long-form audio recordings. The system used fine-tuned Whisper models and custom segmentation techniques to address the low-resource nature of Bengali in speech technology applications.

AIBullisharXiv โ€“ CS AI ยท Mar 35/105
๐Ÿง 

Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization

Researchers developed a multi-pass LLM post-processing system that significantly improves French clinical speech transcription accuracy by alternating between speaker recognition and word recognition passes. The system achieved significant word error rate reductions in suicide prevention conversations while maintaining stability in neurosurgery consultations with feasible computational costs for clinical deployment.

AINeutralarXiv โ€“ CS AI ยท Feb 274/102
๐Ÿง 

A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment

Researchers developed a robust framework for Bangla automatic speech recognition and speaker diarization that can handle long-form audio exceeding 30-60 seconds. The system uses Voice Activity Detection optimization and Connectionist Temporal Classification segmentation to maintain accuracy over extended durations in multi-speaker environments.