Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
Researchers propose Visual-Noise Guided In-Context Distillation (VGID), a novel framework for removing sensitive knowledge from multimodal large language models without full retraining. The method combines visual perturbation with textual in-context unlearning to achieve parameter-level knowledge removal while maintaining model performance, addressing critical privacy and safety concerns in MLLMs.
The advancement of multimodal large language models has created a dual challenge: these systems excel at vision-language tasks but risk memorizing and exposing sensitive information. The VGID framework addresses this by introducing a distillation-based approach that operates at the parameter level rather than merely masking outputs. Unlike previous training-free methods that preserve utility but leave models vulnerable to reverse-engineering, VGID performs actual knowledge removal through dual-modal intervention. The framework constructs an unlearning-oriented teacher distribution by simultaneously perturbing visual inputs and applying textual in-context prompting to the frozen base model. This creates intervention-induced representations that guide a student model toward genuine unlearning without requiring external teachers or annotated undesirable responses. Experimental validation demonstrates significant effectiveness, reducing forget set ROUGE-L by 0.371 while maintaining competitive retain set performance with only a 0.055 drop. The approach directly addresses a critical vulnerability in multimodal systems where visual conditioning signals can independently trigger restricted outputs. This work matters because it reconciles two competing objectives in AI safety: removing memorized sensitive knowledge while preserving general model capabilities. For the AI safety community, this represents progress toward deployable unlearning solutions that don't require expensive full retraining. The implications extend to compliance and privacy regulation, as organizations deploying MLLMs gain a technical pathway to remove restricted knowledge post-deployment. Future research will likely explore scaling VGID to larger models and additional modalities beyond vision-language tasks.
- βVGID achieves parameter-level knowledge removal in MLLMs through distillation rather than output masking alone
- βThe dual-modal intervention combining visual perturbation and textual prompting addresses multimodal-specific safety vulnerabilities
- βFramework eliminates need for external teacher models or explicit annotation of undesirable responses
- βExperimental results show strong unlearning effectiveness with minimal impact on general model utility
- βApproach offers practical path toward post-deployment privacy compliance without full model retraining