AIBullisharXiv – CS AI · 8h ago6/10
🧠
VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing
Researchers unveiled VITA-QinYu, an expressive spoken language model that extends beyond natural conversation to generate role-playing and singing through a hybrid speech-text architecture. The model achieves state-of-the-art performance on conversational benchmarks while demonstrating superior expressiveness in non-conversational tasks, with researchers open-sourcing the code and providing a streaming-capable demo.