Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space
Researchers demonstrate that large language models develop attractor-like geometric patterns in their activation space when processing identity documents describing persistent agents. Experiments on Llama 3.1 and Gemma 2 show paraphrased identity descriptions cluster significantly tighter than structural controls, suggesting LLMs encode semantic agent identity as stable attractors independent of linguistic variation.
This research advances understanding of how LLMs represent abstract concepts by establishing that agent identity exhibits measurable geometric properties analogous to physical attractors. The experiment's controlled design—comparing semantic paraphrases against structurally matched controls across multiple transformer layers—isolates the identity effect from generic linguistic processing, strengthening the claim that something meaningful about agent persistence gets encoded in model activations.
The findings build on growing evidence that LLMs organize knowledge through geometric structure rather than purely symbolic computation. Previous work showed semantic similarity maps to representational proximity; this study extends that principle to meta-level descriptions of agent identity. The cross-architecture replication on Gemma 2 increases confidence in the phenomenon's generalizability across different model families.
The ablation results carry particular significance: semantic content drives the attractor effect while structural completeness appears necessary for convergence. The exploratory finding that reading scientific descriptions of an agent shifts internal states toward the identity attractor—more so than reading unrelated preprints—suggests models distinguish between "knowing about" an identity and "embodying" it operationally.
For AI development, these results hint at mechanistic differences in how models represent agent-like versus passive entities. This could inform future approaches to agent reliability and behavior consistency. However, the work remains largely observational; understanding whether and how these attractors influence model outputs during inference requires additional investigation.
- →LLMs encode agent identity as geometric attractors in activation space, with paraphrased descriptions clustering significantly tighter than structural controls.
- →The attractor effect appears primarily semantic rather than syntactic, driven by meaning rather than linguistic form.
- →Cross-architecture validation on Gemma 2 demonstrates the phenomenon generalizes beyond Llama 3.1.
- →Models show distinct representational responses to descriptions of agent identity versus unrelated scientific content.
- →Structural completeness of identity documents appears necessary for convergence to the attractor region.