y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-protection News & Analysis

5 articles tagged with #model-protection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AIBearisharXiv – CS AI · May 287/10
🧠

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation

A research position paper argues the AI/ML community should abandon the "positive backdoor" terminology and instead rigorously evaluate trigger-activated hidden behaviors as "Secret Alignment." Researchers found that existing implementations show significant brittleness in security properties, particularly in confidentiality, integrity, and availability—revealing that protective claims lack standardized evaluation frameworks.

AIBearisharXiv – CS AI · Apr 147/10
🧠

Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models

Researchers have developed Adaptive Stealing (AS), a novel watermark stealing algorithm that exploits vulnerabilities in LLM watermarking systems by dynamically selecting optimal attack strategies based on contextual token states. This advancement demonstrates that existing fixed-strategy watermark defenses are insufficient, highlighting critical security gaps in protecting proprietary LLM services and raising urgent questions about watermark robustness.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Researchers propose trace rewriting techniques to protect language models from unauthorized knowledge distillation, a process where smaller models learn from larger ones without permission. The methods preserve model accuracy while degrading distillation usefulness and embedding detectable watermarks in student models.

AINeutralarXiv – CS AI · Mar 126/10
🧠

RandMark: On Random Watermarking of Visual Foundation Models

Researchers propose RandMark, a new method for watermarking visual foundation models to protect intellectual property rights. The approach uses a small encoder-decoder network to embed random digital watermarks into internal representations, enabling ownership verification with low false detection rates.