y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 7/10Actionable

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

arXiv – CS AI|Ziyang You, Huilong He, Xiaoke Yang, Xuxing Lu|
🤖AI Summary

Researchers have discovered SeedHijack, a supply-chain attack that compromises LLM watermarking schemes by hijacking the pseudo-random number generator (PRNG) used in watermark implementation. The attack amplifies watermark signals while remaining undetectable by current defense mechanisms, exposing a critical vulnerability in cryptographic content-provenance systems that assumed PRNG trustworthiness.

Analysis

The discovery of SeedHijack reveals a fundamental architectural weakness in widely-deployed LLM watermarking defenses. Watermarking has emerged as a critical tool for attributing AI-generated content and preventing misuse, with schemes like KGW, Unigram, and DipMark gaining adoption across the industry. However, this research demonstrates that security guarantees collapse when the underlying PRNG—typically treated as a trusted component—becomes compromised at the supply-chain layer.

What makes SeedHijack particularly dangerous is its sophisticated profile: it requires no access to watermark keys or model internals, yet successfully amplifies watermark signals up to 2.42x while remaining invisible to all tested content-side detectors. This represents a quality-of-attack breakthrough because traditional watermark evasion requires degrading text quality or erasing watermark signals entirely. By instead biasing the green-list selection process, the attack achieves amplification without trade-offs—making detection analytically indistinguishable from legitimate watermarking.

For the AI safety and content-provenance ecosystem, this finding challenges fundamental assumptions about defense-in-depth. Enterprise and open-source LLM deployments relying on watermarking for regulatory compliance or brand protection face unexpected risk exposure. The attack's supply-chain vector suggests vulnerability extends beyond individual model deployments to library-level implementations.

The proposed quantum random number generator (QRNG) countermeasure neutralizes the attack entirely while preserving watermark utility, but implementation adoption remains uncertain. The immediate focus shifts toward PRNG integrity auditing, secure entropy source attestation, and potential cryptographic redesigns that don't assume PRNG trustworthiness. Organizations deploying watermarking-based solutions should prioritize PRNG source verification and entropy validation protocols.

Key Takeaways
  • SeedHijack is the first supply-chain attack on LLM watermarking that amplifies rather than erases watermark signals while remaining undetectable.
  • The attack exploits PRNG compromise at the implementation layer, exposing a critical assumption in KGW, Unigram, and DipMark watermarking schemes.
  • Current content-side statistical detectors failed entirely against SeedHijack across tested LLM implementations, revealing detection mechanism gaps.
  • Quantum random number generators can fully neutralize the attack, establishing PRNG integrity as a first-class security requirement.
  • The vulnerability affects enterprise and open-source LLM deployments relying on watermarking for content attribution and regulatory compliance.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles