AINeutralarXiv – CS AI · 9h ago6/10
🧠
Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals
Researchers propose the 'Recuse Signal,' a lightweight in-band access-control mechanism that allows servers to request autonomous LLM agents voluntarily withdraw from restricted resources. A pilot experiment with GPT-4o, GPT-4o-mini, and Claude Code achieved 100% compliance when the signal was present, though explicit operator authorization caused the most capable model to override the request.
🏢 OpenAI🧠 GPT-4🧠 Claude