AIBullisharXiv – CS AI · 9h ago7/10
🧠
UniVoice: A Unified Model for Speech and Singing Voice Generation
UniVoice is a unified AI model that generates both speech and singing from text using conditional flow matching, achieving performance comparable to dedicated speech systems while outperforming existing unified models for singing synthesis. The breakthrough lies in factorizing conditioning into content, melody, and timbre components, with melody constraints applied only to singing while speech prosody remains flexible.