AIBearisharXiv – CS AI · 9h ago7/10
🧠
When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability
Researchers found that content moderation systems trained on clean English perform significantly worse when processing code-mixed inputs (mixing English and Tamil), causing a 26.5% decision flip rate between allowing and flagging identical content. The study reveals workflow-level failures in moderation systems, including increased false positives on non-hateful content and higher review burdens, issues missed by standard classification metrics.