AIBullisharXiv – CS AI · 6h ago6/10
🧠
Optimality of FSQ Tokens for Continuous Diffusion for Categorical Data with Application to Text-to-Speech
Researchers demonstrate that FSQ (Finite Scalar Quantization) tokenization optimally structures latent space for continuous diffusion models applied to categorical data, offering a non-autoregressive alternative to large language models. Text-to-speech experiments validate FSQ's superiority, achieving better performance than LLM-based approaches while requiring smaller model sizes and faster inference.