🧠 AI⚪ NeutralImportance 5/10

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

arXiv – CS AI|Gus Lathouwers, Lingyun Gao, Catia Cucchiarini, Helmer Strik|May 29, 2026 at 04:00 AM

🤖AI Summary

Researchers evaluated nine automatic speech recognition (ASR) models on Dutch child speech datasets, finding that fine-tuned Whisper-medium achieved 5.54% word error rate on clean data but 70.37% on noisy data. Using an utterance-level selection method, they identified 42% of clean recordings as reliable without manual verification, achieving 98.3% precision and significantly reducing annotation overhead for child speech research.

Analysis

This research addresses a critical bottleneck in linguistic and developmental research: the manual transcription of child speech, which remains labor-intensive and costly despite advances in automatic speech recognition. The study evaluates cutting-edge ASR models across different architectures (Whisper, Parakeet, Wav2Vec2) on realistic child speech datasets, revealing stark performance gaps between controlled and noisy conditions—a 65-percentage-point difference in error rates demonstrates how environmental factors and speech characteristics fundamentally challenge current systems.

The research builds on years of ASR development that has achieved near-human performance on adult speech in English, yet child speech in low-resource languages remains underserved. Limited child-specific training data and diverse acoustic conditions create compounding challenges. The study's practical contribution extends beyond raw accuracy metrics; the proposed selection method intelligently filters utterances by comparing ASR output to original read prompts, identifying high-confidence transcriptions suitable for direct use without human review.

For researchers and institutions studying language acquisition, this approach offers immediate value by automating partial transcription workflows. By achieving 98.3% precision on selected utterances, the method reduces manual verification burden while maintaining quality standards. This is particularly valuable for low-resource languages where specialized annotators are scarce and expensive.

Looking forward, the significant performance disparity on noisy DART data (70.37% WER) signals that real-world deployment requires substantial additional work. Future research should focus on noise-robust model variants, domain-specific fine-tuning strategies, and better understanding which acoustic characteristics of child speech drive ASR failures.

Key Takeaways

→Fine-tuned Whisper-medium achieves 5.54% WER on clean child speech but 70.37% on noisy data, showing environment-dependent performance.
→A selection method based on prompt comparison identifies 42% of clean and 18% of noisy utterances as reliably transcribed without manual verification.
→Proposed filtering achieves 98.3% precision, significantly reducing annotation overhead for child speech research workflows.
→Child speech in low-resource languages remains under-addressed despite ASR advances, due to limited training data and acoustic diversity.
→Real-world ASR deployment for child speech requires addressing noise robustness and domain-specific model adaptation.

#automatic-speech-recognition #child-speech #asr-models #transcription #low-resource-languages #whisper #nlp #linguistic-research

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge