AIBullisharXiv – CS AI · 3h ago7/10
🧠
Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models
Researchers address a critical limitation in Spoken Language Models (SLMs) for low-resource languages by identifying a fundamental trade-off called the Stability-Expressivity Gap, where synthetic data improves phonetic accuracy but suppresses prosodic variability. The proposed self-alignment frameworks—DGSA and TDSC—recover expressivity while maintaining stability, achieving performance comparable to commercial systems and enabling zero-shot voice cloning for Lao.
🧠 Gemini