Phoneme-Level Mispronunciation Screening in Polish-Speaking Children with an Explainable Assistant
Researchers developed an AI-powered screening tool for detecting speech sound errors in Polish-speaking children, using wav2vec2 technology to identify sibilant substitutions. The system achieves 88.7% accuracy on a test set and demonstrates 72.9% precision with a 2.7% false-alarm rate, designed as a lightweight alternative to specialist evaluation for early intervention.
This research addresses a critical gap in pediatric speech-language pathology by leveraging deep learning to democratize early screening for speech disorders. Traditional identification of phonological disorders depends on specialist availability, creating barriers to timely intervention—particularly in underserved regions. The researchers combined wav2vec2-based acoustic recognition with alignment-based error classification to create a lightweight, explainable pipeline tailored to Polish phonetics and sibilant substitution patterns common in developing speech.
The technical approach demonstrates practical value through rigorous evaluation metrics. The 88.7% exact sequence match on held-out test data indicates robust acoustic modeling, while the conservative screening approach—flagging mismatches only when substitution evidence appears at target segments—prioritizes clinical safety. The 72.9% precision and 2.7% false-alarm rate suggest the system minimizes unnecessary referrals while maintaining reasonable recall at 61.4%, balancing sensitivity against specialist resource constraints.
This development contributes to the broader trend of AI-assisted healthcare delivery in resource-limited settings. Speech screening represents an ideal application domain: high-volume, low-stakes initial assessment where AI augments rather than replaces professional judgment. The authors' emphasis on clinician-in-the-loop validation and clearly defined safety boundaries demonstrates responsible AI deployment in clinical contexts.
Future deployment depends on validating this pipeline across diverse speaker populations and establishing protocols for caregiver-administered screening. The work establishes methodological precedent for language-specific AI diagnostic tools and opens possibilities for similar systems in other languages and phonological conditions. Success here could accelerate adoption of AI screening across pediatric speech pathology and related developmental assessment domains.
- →Wav2vec2-based model achieves 88.7% accuracy detecting sibilant substitutions in Polish-speaking children's speech
- →Conservative screening approach yields 72.9% precision with only 2.7% false-alarm rate on correct productions
- →System designed for non-specialist administration outside clinical settings, addressing specialist access barriers
- →Researchers emphasize explainability and clinician-in-the-loop validation as essential safety requirements
- →Method establishes template for creating language-specific AI screening tools for developmental speech disorders