AINeutralarXiv โ CS AI ยท 4h ago4/10
๐ง
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
Researchers developed a two-stage prompt selection strategy for zero-shot text-to-speech synthesis that improves emotional intensity and speaker consistency. The method evaluates prompts using prosodic features, audio quality, and text-emotion coherence in a static stage, then uses textual similarity for dynamic prompt selection during synthesis.