AINeutralarXiv – CS AI · 14h ago5/10
🧠
Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions
Researchers evaluated nine automatic speech recognition (ASR) models on Dutch child speech datasets, finding that fine-tuned Whisper-medium achieved 5.54% word error rate on clean data but 70.37% on noisy data. Using an utterance-level selection method, they identified 42% of clean recordings as reliable without manual verification, achieving 98.3% precision and significantly reducing annotation overhead for child speech research.