AIBullisharXiv – CS AI · 14h ago7/10
🧠
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
HoliTok is a new continuous speech tokenization model that unifies speech generation and understanding tasks by encoding 48kHz audio into compact 128-dimensional latent sequences at 25Hz. The breakthrough addresses a key challenge in building unified speech foundation models by creating a tokenization space that balances reconstruction fidelity, semantic preservation, and learnability without requiring architectural workarounds.