y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

arXiv – CS AI|Yao Tong, Haonan Wang, Siquan Li, Kenji Kawaguchi, Tianyang Hu|
🤖AI Summary

Researchers have developed SeedPrints, a novel fingerprinting method that identifies Large Language Models based on their random initialization seed rather than post-training characteristics. This approach enables model attribution and provenance verification from inception through full pretraining, addressing limitations of existing methods that only work reliably after fine-tuning.

Analysis

SeedPrints represents a fundamental advancement in LLM provenance verification by shifting the fingerprinting paradigm from post-hoc signatures to intrinsic, seed-dependent identifiers present before training begins. Traditional fingerprinting methods rely on signatures that emerge during fine-tuning, making them unreliable during large-scale pretraining—the phase where models acquire most of their capacity and knowledge. This research gap has significant implications for model attribution, intellectual property protection, and supply chain verification in the increasingly competitive LLM development landscape.

The method exploits prediction biases induced by random initialization, creating persistent identifiers detectable throughout a model's entire lifecycle. By validating SeedPrints across LLaMA-style and Qwen-style architectures under various conditions—including domain shifts and parameter modifications—the researchers demonstrate robustness that existing techniques cannot achieve. This addresses a critical security need as organizations deploy increasingly sophisticated models and face growing concerns about model theft, unauthorized modification, and provenance disputes.

For the AI industry, SeedPrints enables stronger model attribution mechanisms essential for protecting proprietary development and verifying legitimate model ownership. The technology provides developers and organizations with cryptographically-grounded evidence of model lineage independent of training dynamics. For enterprise users and regulatory bodies, reliable model fingerprinting supports transparency requirements and helps combat counterfeit or backdoored model variants. As LLM commercialization accelerates and IP disputes multiply, this capability becomes increasingly valuable for establishing definitive chain-of-custody evidence and preventing model impersonation attacks.

Key Takeaways
  • SeedPrints fingerprints LLMs using random initialization seeds rather than post-training signatures, enabling attribution from model inception onwards.
  • The method remains detectable throughout entire training trajectories, from untrained models through large-scale pretraining and downstream adaptation.
  • Existing fingerprinting methods fail during early pretraining, but SeedPrints maintains effectiveness across all training stages and distribution shifts.
  • Validation on LLaMA and Qwen architectures confirms seed-level distinguishability and robustness under parameter modifications.
  • SeedPrints addresses critical IP protection and provenance verification needs in the competitive LLM development ecosystem.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles