AINeutralarXiv โ CS AI ยท 4h ago7/10
๐ง
Policy-Invisible Violations in LLM-Based Agents
Researchers identified a critical failure mode in LLM-based agents called policy-invisible violations, where agents execute actions that appear compliant but breach organizational policies due to missing contextual information. They introduced PhantomPolicy, a benchmark with 600 test cases, and Sentinel, an enforcement framework using counterfactual graph simulation that achieved 93% accuracy in detecting violations compared to 68.8% for baseline approaches.