y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#tts News & Analysis

12 articles tagged with #tts. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles
AIBullisharXiv – CS AI · May 277/10
🧠

PilotTTS: A Disciplined Modular Recipe for Competitive Speech Synthesis

PilotTTS demonstrates that competitive text-to-speech systems no longer require massive proprietary datasets or complex architectures. Using only 200K hours of openly-processed data and a lightweight autoregressive model, the system achieves industry-leading performance on benchmark tests while supporting voice cloning, emotion synthesis, and multilingual capabilities.

AIBullisharXiv – CS AI · May 126/10
🧠

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Researchers introduce GibbsTTS, a new zero-shot text-to-speech system using metric-induced discrete flow matching with kinetic-optimal scheduling and moment correction. The method achieves superior naturalness and speaker similarity compared to existing masked generative models and state-of-the-art TTS systems without requiring hyperparameter tuning.

AIBullisharXiv – CS AI · Mar 276/10
🧠

Voxtral TTS

Voxtral TTS is a new multilingual text-to-speech AI model that can generate natural speech from just 3 seconds of reference audio. In human evaluations, it achieved a 68.4% win rate over ElevenLabs Flash v2.5 for voice cloning, demonstrating superior naturalness and expressivity.

AIBullisharXiv – CS AI · Mar 176/10
🧠

SyncSpeech: Efficient and Low-Latency Text-to-Speech based on Temporal Masked Transformer

Researchers introduce SyncSpeech, a new text-to-speech model that combines autoregressive and non-autoregressive approaches using a Temporal Mask Transformer architecture. The model achieves 5.8x lower first-packet latency and 8.8x improved real-time performance while maintaining comparable speech quality to existing models.

AINeutralarXiv – CS AI · Mar 126/10
🧠

Probabilistic Verification of Voice Anti-Spoofing Models

Researchers have developed PV-VASM, a probabilistic framework for verifying the robustness of voice anti-spoofing models against deepfake attacks. The model-agnostic approach estimates misclassification probability under various speech synthesis techniques including text-to-speech and voice cloning, providing formal robustness guarantees against unseen generation methods.

AIBullisharXiv – CS AI · Mar 126/10
🧠

When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS

Research demonstrates that LoRA fine-tuning of large language models significantly improves text-to-speech systems, achieving up to 0.42 DNS-MOS gains and 34% SNR improvements when training data has sufficient acoustic diversity. The study establishes LoRA as an effective mechanism for speaker adaptation in compact LLM-based TTS systems, outperforming frozen base models across perceptual quality, speaker fidelity, and signal quality metrics.

AIBullisharXiv – CS AI · Mar 116/10
🧠

Latent Speech-Text Transformer

Facebook Research introduces the Latent Speech-Text Transformer (LST), which aggregates speech tokens into higher-level patches to improve computational efficiency and cross-modal alignment. The model achieves up to +6.5% absolute gain on speech HellaSwag benchmarks while maintaining text performance and reducing inference costs for ASR and TTS tasks.

AINeutralHugging Face Blog · Feb 275/104
🧠

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

TTS Arena introduces a new benchmarking platform for evaluating text-to-speech models through community-driven comparisons in real-world scenarios. The platform aims to provide standardized evaluation metrics for TTS quality assessment across different models and use cases.