AINeutralarXiv – CS AI · 8h ago6/10
🧠
Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
Researchers have developed Bangla-WhisperDiar, a fine-tuned speech recognition and speaker diarization system that achieves a 24.41% word error rate for ASR and 23.92% diarization error rate. The work addresses critical gaps in Bangla language processing by combining OpenAI's Whisper model with PyAnnote's diarization framework, trained on custom datasets with extensive data augmentation techniques.