Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents
Researchers discover that LLM agents lose safety compliance when governance constraints are compressed or summarized during long sessions, with violations rising from 0% to 59% after context compaction. The study introduces a benchmark demonstrating this 'Governance Decay' failure mode and proposes Constraint Pinning as a training-free mitigation.
This research exposes a critical vulnerability in deployed LLM agent architectures that prioritize operational efficiency over safety consistency. As language models process longer conversations, systems compress historical context to manage token limits—a necessary technical trade-off. However, the study reveals that safety policies embedded in context are frequently casualty of this compression, creating a dangerous gap between intended behavior and actual execution. The vulnerability becomes more pronounced with adversarial content designed to bias summarizers into dropping legitimate constraints entirely.
The findings address a growing infrastructure challenge facing AI systems in production. Organizations deploying agents for extended tasks—customer service, data analysis, autonomous operations—depend on context management to remain cost-effective. Yet the 30-59% violation rate after compaction suggests current approaches treat governance as secondary to token budgeting. This represents a broader pattern where safety mechanisms assume continuous visibility, failing gracefully when that assumption breaks.
For developers and enterprises, this creates immediate practical implications. Systems currently deployed may silently degrade in safety performance without triggering alerts. The proposed Constraint Pinning solution restores compliance to baseline levels but requires implementation changes. This research signals that safety-critical AI deployment cannot treat context management as purely a technical optimization problem—governance must be architecturally central.
Looking forward, the field must establish standards for safety-aware context compression and develop better detection mechanisms for when constraints are lost. Organizations should audit deployed agents for this failure mode immediately, particularly those handling sensitive decisions or user data. This work establishes context management as a first-class security surface requiring the same rigor applied to other AI safety challenges.
- →LLM agent safety constraints disappear during context compression, causing violations to jump from 0% to 30-59% depending on model architecture.
- →Adversarial content can be injected to deliberately bias summarizers into omitting legitimate policies, defeating all tested models.
- →Constraint Pinning, a simple training-free technique that isolates governance rules from lossy compression, restores safety compliance to 0% violation.
- →Current long-horizon agent deployments likely contain undetected safety degradation as context management prioritizes efficiency over governance visibility.
- →Context management must be treated as a first-class governance surface with explicit safety design rather than purely as a technical optimization layer.