Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking
Researchers have discovered SeedHijack, a supply-chain attack that compromises LLM watermarking schemes by hijacking the pseudo-random number generator (PRNG) used in watermark implementation. The attack amplifies watermark signals while remaining undetectable by current defense mechanisms, exposing a critical vulnerability in cryptographic content-provenance systems that assumed PRNG trustworthiness.
The discovery of SeedHijack reveals a fundamental architectural weakness in widely-deployed LLM watermarking defenses. Watermarking has emerged as a critical tool for attributing AI-generated content and preventing misuse, with schemes like KGW, Unigram, and DipMark gaining adoption across the industry. However, this research demonstrates that security guarantees collapse when the underlying PRNG—typically treated as a trusted component—becomes compromised at the supply-chain layer.
What makes SeedHijack particularly dangerous is its sophisticated profile: it requires no access to watermark keys or model internals, yet successfully amplifies watermark signals up to 2.42x while remaining invisible to all tested content-side detectors. This represents a quality-of-attack breakthrough because traditional watermark evasion requires degrading text quality or erasing watermark signals entirely. By instead biasing the green-list selection process, the attack achieves amplification without trade-offs—making detection analytically indistinguishable from legitimate watermarking.
For the AI safety and content-provenance ecosystem, this finding challenges fundamental assumptions about defense-in-depth. Enterprise and open-source LLM deployments relying on watermarking for regulatory compliance or brand protection face unexpected risk exposure. The attack's supply-chain vector suggests vulnerability extends beyond individual model deployments to library-level implementations.
The proposed quantum random number generator (QRNG) countermeasure neutralizes the attack entirely while preserving watermark utility, but implementation adoption remains uncertain. The immediate focus shifts toward PRNG integrity auditing, secure entropy source attestation, and potential cryptographic redesigns that don't assume PRNG trustworthiness. Organizations deploying watermarking-based solutions should prioritize PRNG source verification and entropy validation protocols.
- →SeedHijack is the first supply-chain attack on LLM watermarking that amplifies rather than erases watermark signals while remaining undetectable.
- →The attack exploits PRNG compromise at the implementation layer, exposing a critical assumption in KGW, Unigram, and DipMark watermarking schemes.
- →Current content-side statistical detectors failed entirely against SeedHijack across tested LLM implementations, revealing detection mechanism gaps.
- →Quantum random number generators can fully neutralize the attack, establishing PRNG integrity as a first-class security requirement.
- →The vulnerability affects enterprise and open-source LLM deployments relying on watermarking for content attribution and regulatory compliance.