AINeutralarXiv โ CS AI ยท 6h ago2
๐ง
GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules
Researchers introduce GMP, a new benchmark highlighting critical challenges in AI content moderation systems when dealing with co-occurring policy violations and dynamic platform rules. The study reveals that current large language models struggle with consistent moderation when policies are unstable or context-dependent, leading to either over-censorship or allowing harmful content.