#policy-compliance News & Analysis

4 articles tagged with #policy-compliance. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AINeutralarXiv – CS AI · May 117/10

🧠

RuleSafe-VL: Evaluating Rule-Conditioned Decision Reasoning in Vision-Language Content Moderation

Researchers introduced RuleSafe-VL, a new benchmark for evaluating how well vision-language AI models apply explicit content moderation rules. The benchmark reveals significant gaps in rule-reasoning capabilities, with even top models achieving only 64.8% accuracy on rule-interaction recovery, indicating current safety systems may reach correct moderation decisions through superficial pattern-matching rather than genuine policy understanding.

AINeutralarXiv – CS AI · Apr 157/10

🧠

Policy-Invisible Violations in LLM-Based Agents

Researchers identified a critical failure mode in LLM-based agents called policy-invisible violations, where agents execute actions that appear compliant but breach organizational policies due to missing contextual information. They introduced PhantomPolicy, a benchmark with 600 test cases, and Sentinel, an enforcement framework using counterfactual graph simulation that achieved 93% accuracy in detecting violations compared to 68.8% for baseline approaches.

AINeutralarXiv – CS AI · May 126/10

🧠

SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making

Researchers present SKG-VLA, an AI system that uses Scene Knowledge Graphs to improve decision-making in large-scale complaint handling by integrating multimodal evidence (text, images, metadata) with structured reasoning about entities, policies, and temporal events. The approach demonstrates improved accuracy and robustness across policy-grounded reasoning and long-tail scenarios.

AINeutralarXiv – CS AI · Apr 146/10

🧠

EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems

Researchers introduce EmbodiedGovBench, a new evaluation framework for embodied AI systems that measures governance capabilities like controllability, policy compliance, and auditability rather than just task completion. The benchmark addresses a critical gap in AI safety by establishing standards for whether robot systems remain safe, recoverable, and responsive to human oversight under realistic failures.