y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#tts News & Analysis

10 articles tagged with #tts. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles
AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

Voxtral TTS

Voxtral TTS is a new multilingual text-to-speech AI model that can generate natural speech from just 3 seconds of reference audio. In human evaluations, it achieved a 68.4% win rate over ElevenLabs Flash v2.5 for voice cloning, demonstrating superior naturalness and expressivity.

AIBullishMarkTechPost ยท Mar 176/10
๐Ÿง 

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models

Google AI has released WAXAL, an open multilingual speech dataset covering 24 African languages to improve Automatic Speech Recognition and Text-to-Speech systems. This addresses the significant data distribution problem where African languages remain poorly represented in speech technology training corpora.

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models
๐Ÿข Google
AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

SyncSpeech: Efficient and Low-Latency Text-to-Speech based on Temporal Masked Transformer

Researchers introduce SyncSpeech, a new text-to-speech model that combines autoregressive and non-autoregressive approaches using a Temporal Mask Transformer architecture. The model achieves 5.8x lower first-packet latency and 8.8x improved real-time performance while maintaining comparable speech quality to existing models.

AINeutralarXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Probabilistic Verification of Voice Anti-Spoofing Models

Researchers have developed PV-VASM, a probabilistic framework for verifying the robustness of voice anti-spoofing models against deepfake attacks. The model-agnostic approach estimates misclassification probability under various speech synthesis techniques including text-to-speech and voice cloning, providing formal robustness guarantees against unseen generation methods.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS

Research demonstrates that LoRA fine-tuning of large language models significantly improves text-to-speech systems, achieving up to 0.42 DNS-MOS gains and 34% SNR improvements when training data has sufficient acoustic diversity. The study establishes LoRA as an effective mechanism for speaker adaptation in compact LLM-based TTS systems, outperforming frozen base models across perceptual quality, speaker fidelity, and signal quality metrics.

AIBullishMarkTechPost ยท Mar 116/10
๐Ÿง 

Fish Audio Releases Fish Audio S2: A New Generation of Expressive Text-to-Speech (TTS) with Absurdly Controllable Emotion

Fish Audio has released S2-Pro, a flagship Large Audio Model (LAM) that enables high-fidelity, multi-speaker text-to-speech synthesis with sub-150ms latency. The system features zero-shot voice cloning capabilities and granular emotion control, representing a shift from traditional modular TTS pipelines to integrated audio models.

AIBullisharXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

Latent Speech-Text Transformer

Facebook Research introduces the Latent Speech-Text Transformer (LST), which aggregates speech tokens into higher-level patches to improve computational efficiency and cross-modal alignment. The model achieves up to +6.5% absolute gain on speech HellaSwag benchmarks while maintaining text performance and reducing inference costs for ASR and TTS tasks.

AINeutralHugging Face Blog ยท Feb 275/104
๐Ÿง 

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

TTS Arena introduces a new benchmarking platform for evaluating text-to-speech models through community-driven comparisons in real-world scenarios. The platform aims to provide standardized evaluation metrics for TTS quality assessment across different models and use cases.