y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#language-technology News & Analysis

2 articles tagged with #language-technology. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv – CS AI · 10h ago7/10
🧠

WorldSpeech: A Multilingual Speech Corpus from Around the World

Researchers introduce WorldSpeech, a multilingual speech corpus containing 65,000 hours of aligned audio-transcript data across 76 languages, addressing the critical gap in ASR training data for low-resource languages. Fine-tuning existing ASR models on this dataset achieves an average 63.5% relative Word-Error-Rate reduction, significantly improving speech recognition accuracy for underrepresented languages.

AINeutralarXiv – CS AI · 10h ago6/10
🧠

Align and Shine: Building High-Quality Sentence-Aligned Corpora for Multilingual Text Simplification

Researchers have created a multilingual text simplification corpus by collecting and aligning sentence-level data from comparable corpora across five languages (Catalan, English, French, Italian, and Spanish). The dataset addresses a critical gap in NLP resources for non-English languages and is publicly available for training and evaluating text simplification models.