#african-languages News & Analysis

5 articles tagged with #african-languages. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBearisharXiv – CS AI · Jun 27/10

🧠

TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

Researchers introduce TukaBench, a jailbreak safety benchmark for seven African languages that reveals LLMs are significantly more vulnerable to adversarial prompts when queried in African languages versus English, with culturally adapted prompts proving most effective at bypassing safety measures. The study identifies critical gaps in LLM safety evaluation for low-resource languages and demonstrates that existing judging mechanisms fail to accurately assess model responses in these languages.

🧠 GPT-5

AIBullisharXiv – CS AI · Mar 37/103

🧠

WAXAL: A Large-Scale Multilingual African Language Speech Corpus

Researchers have released WAXAL, a large-scale multilingual speech dataset covering 24 Sub-Saharan African languages representing over 100 million speakers. The dataset includes 1,250 hours of transcribed speech for ASR and 235 hours of high-quality recordings for TTS, released under CC-BY-4.0 license to advance inclusive AI technologies.

AIBullisharXiv – CS AI · May 96/10

🧠

ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model

Researchers introduced ANGOFA, four pre-trained language models tailored for Angolan languages using Multilingual Adaptive Fine-tuning (MAFT) with OFA embedding initialization and synthetic data. The approach achieved 12.3 and 3.8 point improvements over previous state-of-the-art models, addressing a critical gap in NLP support for very-low resource African languages.

AIBullishMarkTechPost · Mar 176/10

🧠

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models

Google AI has released WAXAL, an open multilingual speech dataset covering 24 African languages to improve Automatic Speech Recognition and Text-to-Speech systems. This addresses the significant data distribution problem where African languages remain poorly represented in speech technology training corpora.

🏢 Google

AIBullishMicrosoft Research Blog · Feb 56/103

🧠

Paza: Introducing automatic speech recognition benchmarks and models for low resource languages

Microsoft Research launched Paza, a human-centered speech recognition pipeline, and PazaBench, the first benchmark leaderboard specifically designed for low-resource languages. The initiative covers 39 African languages with 52 models and has been tested with real communities to improve AI accessibility for underrepresented languages.