AIBearisharXiv – CS AI · 2h ago7/10
🧠
Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles
Researchers introduce WIRE, a diagnostic pipeline for detecting conflicting rules within LLM agent prompt policies. Testing six public policies, the system identified 170 rule-pair conflicts and found that 64.6% of witnessed conflict scenarios resulted in at least one source-rule violation, revealing significant gaps in how language models handle competing policy directives.