#speech-quality News & Analysis

3 articles tagged with #speech-quality. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · May 287/10

🧠

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Researchers address a critical limitation in Spoken Language Models (SLMs) for low-resource languages by identifying a fundamental trade-off called the Stability-Expressivity Gap, where synthetic data improves phonetic accuracy but suppresses prosodic variability. The proposed self-alignment frameworks—DGSA and TDSC—recover expressivity while maintaining stability, achieving performance comparable to commercial systems and enabling zero-shot voice cloning for Lao.

🧠 Gemini

AINeutralarXiv – CS AI · Jun 195/10

🧠

PrefSQA: Pairwise Preference Prediction for Speech Quality Assessment and the Critical Role of High Quality Datasets

Researchers introduce PrefSQA, a machine learning method that predicts speech quality through pairwise preference comparisons rather than traditional mean opinion scores (MOS). The approach incorporates uncertainty-aware logits and attention mechanisms, demonstrating that preference-based labeling produces cleaner, more reliable datasets than scalar MOS ratings, though improvements vary significantly based on dataset quality.

AIBullisharXiv – CS AI · May 76/10

🧠

JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions

Researchers introduce JASTIN, an instruction-driven framework that combines frozen audio encoders with fine-tuned LLMs to evaluate generative audio models with zero-shot capabilities. The approach achieves state-of-the-art correlation with human ratings across speech, sound, and music evaluation tasks without task-specific retraining.