AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce FIT, a continual unlearning framework enabling large language models to efficiently forget privacy-sensitive, copyrighted, and harmful content across sequential deletion requests. The method addresses critical limitations of existing single-shot unlearning approaches by preventing catastrophic forgetting while maintaining model utility, demonstrated across models up to 14B parameters.
AINeutralarXiv – CS AI · May 77/10
🧠Researchers present an automated pipeline for auditing behavioral changes in large language models when interventions are applied. The method generates human-readable hypotheses about model differences and validates them statistically, successfully identifying both intended and unexpected side-effects across real-world interventions like knowledge editing and unlearning.
AINeutralarXiv – CS AI · Mar 47/102
🧠Researchers introduce WARP, a new defense mechanism for machine unlearning protocols that protects against privacy attacks where adversaries can exploit differences between pre- and post-unlearning AI models. The technique reduces attack success rates by up to 92% while maintaining model accuracy on retained data.
AINeutralarXiv – CS AI · Mar 37/105
🧠Researchers introduce 'agentic unlearning' through Synchronized Backflow Unlearning (SBU), a framework that removes sensitive information from both AI model parameters and persistent memory systems. The method addresses critical gaps in existing unlearning techniques by preventing cross-pathway recontamination between memory and parameters.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce PPU-Bench, a benchmark for testing personalized partial unlearning in multimodal AI models, addressing the challenge of selectively removing sensitive memorized information while preserving model utility. The study reveals significant trade-offs between forgetting target knowledge and retaining non-target facts, proposing Boundary-Aware Optimization as a solution for fine-grained factual control.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers introduce SafeRedir, an inference-time framework that safely redirects unsafe prompts in image generation models by rerouting them toward benign semantic regions without modifying underlying model weights. The lightweight approach uses token-level embedding interventions to mitigate generation of NSFW content and copyrighted styles while maintaining image quality and resisting adversarial attacks.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers introduce a sequential unlearning framework that enables Large Language Models to forget sensitive data while maintaining performance, addressing GDPR compliance and the Right to be Forgotten in politically sensitive deployments. The method stabilizes general capabilities through positive fine-tuning before selectively suppressing designated patterns, demonstrating effectiveness on the SemEval-2025 benchmark with minimal accuracy degradation.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers introduce VLA-Forget, a new unlearning framework for vision-language-action (VLA) models used in robotic manipulation. The hybrid approach addresses the challenge of removing unsafe or unwanted behaviors from embodied AI foundation models while preserving their core perception, language, and action capabilities.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers propose a new framework for improving safety in multimodal AI models by targeting unsafe relationships between objects rather than removing entire concepts. The approach uses parameter-efficient edits to suppress dangerous combinations while preserving benign uses of the same objects and relations.
AINeutralarXiv – CS AI · Mar 37/107
🧠Researchers introduce SurgUn, a surgical unlearning method for text-to-image diffusion models that enables precise removal of specific visual concepts while preserving other capabilities. The approach addresses challenges in copyright compliance and content policy enforcement by applying targeted weight-space updates based on retroactive interference theory.
AIBullisharXiv – CS AI · Mar 37/105
🧠Researchers introduce ALTER, a new framework for efficiently "unlearning" specific knowledge from large language models while preserving their overall utility. The system uses asymmetric LoRA architecture to selectively forget targeted information with 95% effectiveness while maintaining over 90% model utility, significantly outperforming existing methods.
AIBullisharXiv – CS AI · Mar 27/1024
🧠Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.