TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages
Researchers introduce TukaBench, a jailbreak safety benchmark for seven African languages that reveals LLMs are significantly more vulnerable to adversarial prompts when queried in African languages versus English, with culturally adapted prompts proving most effective at bypassing safety measures. The study identifies critical gaps in LLM safety evaluation for low-resource languages and demonstrates that existing judging mechanisms fail to accurately assess model responses in these languages.
TukaBench addresses a critical blind spot in AI safety research: the heavy concentration of LLM evaluation in English-speaking contexts leaves African and other low-resource language speakers potentially exposed to unreliable model behavior. The research systematically demonstrates that linguistic and cultural factors significantly influence model safety, with prompts adapted to African cultural contexts triggering fewer refusals than direct English translations. This finding carries substantial implications for developers deploying LLMs globally, as safety benchmarks optimized for English may create false confidence about model robustness across language communities.
The study's methodology is particularly rigorous, employing multiple prompt settings including code-switching between English and African languages to isolate specific variables affecting model behavior. By introducing the 'Deflection' category alongside traditional 'Refused' and 'Jailbroken' classifications, researchers capture nuanced model failures previously invisible in standard evaluations. The discovery that LLM-as-a-judge reliability deteriorates in lower-resource languages represents a fundamental challenge for automated safety assessment at scale.
For the AI industry, these findings suggest that current safety evaluation frameworks are inadequate for genuinely global deployment. Organizations developing or deploying LLMs in African markets face previously unmeasured risks, while researchers must rethink safety benchmarking methodologies to accommodate linguistic and cultural diversity. The work highlights that safety is not a universal property but context-dependent, requiring localized evaluation approaches. Moving forward, AI developers should prioritize multilingual safety testing and validation protocols before expanding into underrepresented language communities.
- βLLMs show reduced refusal rates when prompted in African languages compared to English, indicating systematic safety gaps in low-resource language models.
- βCulturally adapted prompts prove more effective at bypassing safety mechanisms than direct translations, demonstrating that cultural context amplifies jailbreak success.
- βAutomated LLM-based evaluation tools show significantly lower agreement with human judgment in African languages, compromising reliability of safety assessments.
- βThe introduction of 'Deflection' as a distinct failure mode captures previously invisible model comprehension failures in low-resource language contexts.
- βCurrent LLM safety benchmarks systematized around English-language testing fail to reflect actual model vulnerabilities across global language communities.