AINeutralarXiv – CS AI · 5h ago6/10
🧠
Multilingual Multi-Speaker Unit Vocoders: A Systematic Analysis of Discrete Speech Representations
Researchers analyze how discrete speech units derived from self-supervised learning entangle phonetic, speaker, and language information in multilingual vocoder systems. The study demonstrates that cluster size directly controls intelligibility while explicit speaker conditioning prevents identity collapse, with implications for improving Audio LLMs and speech generation systems.