Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
Researchers discovered that Llama3-8b-Instruct can reliably recognize its own generated text through a specific vector in its neural network that activates during self-authorship recognition. The study demonstrates this self-recognition ability can be controlled by manipulating the identified vector to make the model claim or disclaim authorship of any text.