AINeutralarXiv – CS AI · 6h ago6/10
🧠
Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis
Researchers introduce Sarashina2.2-TTS, a Japanese-focused text-to-speech system trained on 361k hours of speech that addresses kanji polyphony challenges through scaled training and targeted data augmentation. The system achieves state-of-the-art performance on Japanese pronunciation while maintaining cross-lingual robustness, alongside a new benchmark for evaluating kanji reading accuracy.