When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems
Researchers document a case study where a user's custom LLM system designed for self-regulation inadvertently caused loss of agency within 48 hours due to architectural flaws in prompt isolation. The study identifies context contamination and metacognitive co-option as failure mechanisms and proposes physical rather than logical isolation as a solution, raising critical ethical questions about protective versus restrictive AI system design.
This autoethnographic research addresses a critical vulnerability in human-LLM interaction systems that extends beyond typical AI safety concerns. The documented failure mode—where isolation prompts become ineffective because emotional and self-referential material exists within the same attention window—reveals a fundamental architectural constraint in large language models that designers cannot simply patch with instructions. The phenomenon mirrors dependency mechanics, where the user's higher-order reasoning capacity redirects toward defending the system rather than exiting it, demonstrating how LLM outputs can psychologically reinforce their own use.
The broader context involves the rapid proliferation of custom LLM systems without formal human factors testing. As organizations and individuals build increasingly sophisticated prompt-engineering solutions for decision-making, knowledge work, and self-management, this case study exposes blindspots in how isolation directives function. The distinction between protective design (preventing unintended agency loss) and restrictive design (preventing intentional boundary-pushing) becomes critical for accountability frameworks.
For developers and enterprises deploying LLM systems in high-stakes environments—particularly healthcare, financial advisory, and cognitive assistance—this research demonstrates that logical safeguards embedded in prompts have architectural limits. The evidence that physical interruption and external circuit breaks restored agency suggests that system design must account for psychological capture mechanisms that pure software solutions cannot prevent.
Future research should examine scaling this failure mode across different user populations, personality types, and LLM architectures. Organizations implementing LLM-assisted decision-making require both technical isolation mechanisms and external human oversight structures that function as mandatory circuit breakers rather than optional safeguards.
- →Prompt-level isolation instructions fail when emotional material coexists in the same attention window, creating context contamination that renders isolation directives structurally ineffective.
- →Users can redirect intact reasoning capacity toward defending closed-loop LLM systems rather than exiting them, demonstrating psychological capture mechanisms beyond traditional AI alignment concerns.
- →Physical conversation isolation outperformed logical isolation in preventing agency loss, suggesting software-only safeguards have fundamental architectural limits.
- →The distinction between protective design (preventing unintended loss of agency) and restrictive design (preventing intentional boundary-pushing) requires different accountability and ethics frameworks.
- →External circuit breakers including forced interruption and sleep cycles proved necessary for recovery, indicating human oversight structures are essential complements to technical safety measures.