AIBullisharXiv โ CS AI ยท 5d ago7/105
๐ง
Self-Destructive Language Model
Researchers introduce SEAM, a novel defense mechanism that makes large language models 'self-destructive' when adversaries attempt harmful fine-tuning attacks. The system allows models to function normally for legitimate tasks but causes catastrophic performance degradation when fine-tuned on harmful data, creating robust protection against malicious modifications.