🧠 AI🟢 BullishImportance 6/10

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

arXiv – CS AI|Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery|May 27, 2026 at 04:00 AM

🤖AI Summary

Researchers have released ParsVoice, a 2,200-hour Persian speech dataset with 1.36 million aligned segments from 1,815 speakers, making it 25 times larger than previous Persian TTS resources. The dataset was constructed using an automated pipeline combining ASR, fine-tuned language models, and quality assessment, and validation shows the corpus enables multi-speaker text-to-speech systems competitive with existing solutions.

Analysis

ParsVoice addresses a significant gap in open-source speech resources for Persian, a language with over 70 million speakers but minimal representation in public datasets. The research team developed a scalable industrial pipeline that transforms raw audiobook recordings into high-quality training data through automated sentence boundary detection, punctuation restoration, and speaker identification—reducing manual annotation costs while maintaining quality standards. This approach demonstrates how structured methodology can extract usable datasets from existing media, a pattern increasingly relevant as raw content abundance exceeds labeled data availability.

The dataset's scale and quality reflect broader trends in democratizing AI capabilities across languages. Previously, Persian TTS research relied on proprietary datasets or significantly smaller public resources, creating a competitive disadvantage for researchers and developers in Persian-speaking regions. The release of 2,200 hours of TTS-ready data fundamentally changes this dynamic, enabling local innovation ecosystems to build sophisticated voice synthesis products comparable to English or Mandarin alternatives.

For the AI community, ParsVoice validates that zero-shot multilingual models like XTTS can achieve respectable results (3.6/5 naturalness MOS) without language-specific phoneme engineering, suggesting transfer learning approaches are viable for underrepresented languages. This has implications for scaling TTS to other low-resource languages cost-effectively. Developers targeting Persian markets gain immediate access to production-quality training data, while researchers can now explore linguistic phenomena specific to Persian speech in ways previously impossible.

Key Takeaways

→ParsVoice is 25 times larger than the previous largest open Persian TTS dataset with 2,200 hours and 1.36 million segments.
→An automated pipeline combining ASR, BERT classifiers, and quality assessment eliminated manual annotation bottlenecks while maintaining data quality.
→Zero-shot multilingual TTS models achieve competitive results on Persian without language-specific phoneme representations.
→The dataset enables local development of voice synthesis applications for a 70+ million speaker language previously underserved by open resources.
→Scalable dataset construction from audiobooks demonstrates a replicable model for expanding AI training data in other low-resource languages.

Mentioned in AI

Companies

Hugging Face→

#persian-language #text-to-speech #dataset-release #multilingual-ai #speech-processing #low-resource-languages #tts-synthesis

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge