ToxSyn-PT: A Synthetic Fine-Grained Dataset of Minority-Targeted Toxic Language in Portuguese
Researchers introduce ToxSyn-PT, a large-scale Portuguese dataset for detecting hate speech targeting minority groups, featuring fine-grained annotations and non-toxic counterexamples absent in existing datasets. The study reveals that hate speech detection models trained on social media fail to generalize to minority-specific contexts, exposing critical gaps in current evaluation metrics and highlighting the need for specialized datasets in non-English languages.
ToxSyn-PT addresses a significant blind spot in natural language processing: the shortage of high-quality training data for hate speech detection in languages beyond English, particularly for nuanced, minority-targeted harassment. The dataset's four-stage synthetic generation pipeline produces 9 protected minority group categories with discourse-type annotations capturing rhetorical strategies like sarcasm and dehumanization—elements crucial for distinguishing genuine hate from casual discussion. This granularity represents a methodological advancement over binary toxic/non-toxic labeling prevalent in existing corpora.
The research's most consequential finding challenges how the AI community evaluates model performance. The mutual generalization failure between social-media-trained models and minority-specific contexts reveals that these represent fundamentally different tasks. Standard metrics like Macro F1 scores mask catastrophic failures in specific domains, creating a false sense of model robustness. This discovery has implications for deployed hate speech detection systems that may perform adequately on aggregate benchmarks while failing users from minority communities.
For the broader AI development ecosystem, ToxSyn-PT signals growing recognition that synthetic data can address data scarcity in under-resourced language communities, though synthetic generation introduces its own validation challenges. The public release on HuggingFace democratizes access to this resource. Organizations developing content moderation systems for Portuguese-speaking markets must now contend with evidence that existing approaches inadequately protect minority users—a compliance and reputational risk.
- →ToxSyn-PT introduces the first large-scale Portuguese hate speech dataset with explicit minority-group targeting and non-toxic counterexamples absent from competing datasets.
- →Models trained on social media data catastrophically fail to generalize to minority-specific hate speech contexts, indicating these are distinct detection problems requiring separate approaches.
- →Standard performance metrics like Macro F1 can completely mask model failures in specific domains, necessitating domain-specific evaluation methodologies.
- →Synthetic data generation via controlled pipelines offers a viable approach to addressing hate speech detection gaps in low- and mid-resource languages.
- →Content moderation systems deployed without minority-specific training data face compliance risks and inadequate protection for vulnerable user populations.