AINeutralarXiv – CS AI · 9h ago7/10
🧠
RuleSafe-VL: Evaluating Rule-Conditioned Decision Reasoning in Vision-Language Content Moderation
Researchers introduced RuleSafe-VL, a new benchmark for evaluating how well vision-language AI models apply explicit content moderation rules. The benchmark reveals significant gaps in rule-reasoning capabilities, with even top models achieving only 64.8% accuracy on rule-interaction recovery, indicating current safety systems may reach correct moderation decisions through superficial pattern-matching rather than genuine policy understanding.