SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
Researchers propose SWAP, a sequential watermarking technique to protect copyright of soft prompts used in vision-language models like CLIP. The method embeds watermarks through ordered out-of-distribution classes, addressing fundamental limitations of existing auditing approaches that fail due to conflicting objectives between watermarking and primary task performance.
The paper addresses a critical gap in AI model security as vision-language models become increasingly valuable intellectual property. Soft prompts—lightweight task-specific adaptations of large foundation models—represent significant research and development investment, yet lack robust copyright protection mechanisms. Existing auditing methods fail fundamentally: non-intrusive approaches generate false positives when different models train on similar data, while intrusive backdoor techniques either cannot embed functional triggers or compromise model performance.
The core innovation of SWAP stems from recognizing that previous watermarking attempts operated in the same decision space as the primary task, creating inherent conflicts. By shifting watermarks to a more complex space using ordered out-of-distribution class predictions, SWAP maintains primary task performance while embedding verifiable ownership markers. This approach leverages CLIP's zero-shot capability, making watermarks harder to detect or remove without degrading utility.
For the AI development community, this research addresses a practical protection mechanism that could accelerate intellectual property sharing and commercialization of fine-tuned models. Organizations developing task-specific prompts can now verify ownership claims more reliably, reducing friction in AI model licensing and attribution disputes. The hypothesis-test-guided verification protocol provides formal guarantees about when auditing succeeds, adding credibility to ownership claims.
Looking forward, as soft prompting becomes standard practice for adapting foundation models, robust auditing mechanisms become essential infrastructure. The success of SWAP's approach may inspire similar techniques for other adaptable model architectures, potentially influencing how AI intellectual property rights develop in coming years.
- →SWAP proposes sequential watermarking that embeds protection markers in out-of-distribution class prediction space rather than primary task space
- →Existing non-intrusive and intrusive auditing methods fail due to fundamental conflicts between watermarking and task performance objectives
- →The technique maintains original model predictions while remaining robust against potential removal attacks
- →Extensive validation across 11 datasets demonstrates effectiveness without harming model utility or creating false positives
- →Hypothesis-test-guided verification provides theoretical guarantees for when ownership auditing succeeds