AIBullisharXiv – CS AI · 7h ago6/10
🧠
Pretrained self-supervised speech models can recognize unseen consonants
Researchers demonstrate that pretrained self-supervised speech models (Wav2Vec2 and HuBERT) can accurately recognize click consonants from low-resource Khoisan languages despite training data heavily skewed toward high-resource languages. Fine-tuning on click-rich language data reveals these models generalize better to rare phonemes than expected, suggesting self-supervision creates robust representations across diverse human speech sounds.