AIBearisharXiv – CS AI · 6h ago7/10
🧠
The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization
Researchers demonstrate that current concept erasure (unlearning) methods in text-to-image diffusion models fail to truly remove harmful knowledge, instead only disrupting the linguistic pathways to that knowledge. They introduce IVO, an attack framework that exploits this weakness by reconstructing the mappings and reviving the dormant memories, exposing fundamental vulnerabilities in 11 existing unlearning techniques.