AINeutralarXiv – CS AI · 18h ago6/10
🧠
BareWave: Waveform-Native Flow-Matching Text-to-Speech
Researchers introduce BareWave, a waveform-native text-to-speech system using flow-matching that eliminates intermediate acoustic representations and separate decoding stages. The framework addresses three key training challenges—lack of representational scaffolding, noise schedule optimization, and perceptual objective alignment—while maintaining inference without pretrained components, demonstrating competitive results in zero-shot voice cloning.