AINeutralarXiv โ CS AI ยท 5h ago
๐ง
ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
Researchers propose ZeSTA, a domain-conditioned training framework that improves personalized speech synthesis by better integrating synthetic and real speech data. The method addresses speaker similarity degradation issues when using zero-shot text-to-speech augmentation with limited real recordings.