AIBullisharXiv – CS AI · 7h ago6/10
🧠
Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS
Researchers introduce Chatterbox-Flash, a zero-shot text-to-speech model combining block-diffusion decoding with streaming capabilities. The system addresses token distribution bias through prior-calibrated scoring and early-decoding schedules, achieving high-fidelity speech synthesis with low latency comparable to autoregressive systems.