AINeutralarXiv – CS AI · 8h ago6/10
🧠
Acoustic and perceptual differences between standard and accented speech and their voice clones
Researchers analyzed how voice cloning technology preserves accented speech compared to standard speech, finding that clones of accented speakers show larger perceptual differences from originals despite similar baseline-normalized embedding distances. The study reveals that accent variation significantly impacts perceived speaker identity and intelligibility in voice cloning systems, suggesting current speaker-discriminative embeddings don't fully capture accent preservation.