AINeutralarXiv – CS AI · 15h ago6/10
🧠
DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion
Researchers introduce DSA-Tokenizer, a novel speech tokenization system that separates semantic content from acoustic style using distinct optimization paths and Flow Matching decoders. The approach enables discrete Speech LLMs to achieve better disentanglement while supporting efficient voice cloning and high-fidelity speech generation with minimal inference steps.