#machine-unlearning News & Analysis

27 articles tagged with #machine-unlearning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

27 articles

AINeutralarXiv – CS AI · 3d ago7/10

🧠

RULER: Representation-Level Verification of Machine Unlearning

Researchers introduce RULER, a verification framework that detects machine unlearning failures at the representation level rather than just output metrics. The study reveals that popular unlearning methods pass traditional evaluation tests yet still retain encoded information about forgotten data in their internal representations, highlighting a critical gap in current verification protocols.

AINeutralarXiv – CS AI · 4d ago7/10

🧠

ICCU: In-Context Continual Unlearning via Pattern-Induced Refusal Rules

Researchers introduce ICCU, an in-context continual unlearning framework that removes specific data influence from language models without modifying parameters. The method uses pattern-induced refusal rules applied at inference time, addressing the inefficiency of sequential unlearning requests in production deployments.

AIBearisharXiv – CS AI · 4d ago7/10

🧠

Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models

Researchers have developed BEAP, a black-box adversarial attack that bypasses machine unlearning safeguards in text-to-image diffusion models by generating natural-language prompts that evade detection filters. The attack achieves 60% higher success rates than previous methods while remaining undetectable to safety systems, raising critical questions about the robustness of AI model safety mechanisms.

AINeutralarXiv – CS AI · May 17/10

🧠

Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery

Researchers propose a machine unlearning framework to detect and remove neural backdoors—hidden triggers inserted during AI training that can compromise system integrity. Using model inversion and statistical analysis, the approach identifies malicious patterns and autonomously detaches machine behavior from backdoor triggers, addressing a critical cybersecurity vulnerability in AI systems.

AIBullisharXiv – CS AI · Apr 157/10

🧠

RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

Researchers introduce RePAIR, a framework enabling users to instruct large language models to forget harmful knowledge, misinformation, and personal data through natural language prompts at inference time. The system uses a training-free method called STAMP that manipulates model activations to achieve selective unlearning with minimal computational overhead, outperforming existing approaches while preserving model utility.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Explainable LLM Unlearning Through Reasoning

Researchers introduce Targeted Reasoning Unlearning (TRU), a new method for removing specific knowledge from large language models while preserving general capabilities. The approach uses reasoning-based targets to guide the unlearning process, addressing issues with previous gradient ascent methods that caused unintended capability degradation.

AINeutralarXiv – CS AI · Mar 37/104

🧠

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

Researchers identify a 'safety mirage' problem in vision language models where supervised fine-tuning creates spurious correlations that make models vulnerable to simple attacks and overly cautious with benign queries. They propose machine unlearning as an alternative that reduces attack success rates by up to 60.27% and unnecessary rejections by over 84.20%.

AIBullisharXiv – CS AI · Mar 37/102

🧠

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Researchers propose Partial Model Collapse (PMC), a novel machine unlearning method for large language models that removes private information without directly training on sensitive data. The approach leverages model collapse - where models degrade when trained on their own outputs - as a feature to deliberately forget targeted information while preserving general utility.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs

Researchers introduce Shadow Unlearning, a privacy-preserving machine unlearning method that removes training data influence from LLMs without exposing sensitive information to attacks. The Neuro-Semantic Projector Unlearning (NSPU) framework achieves this while maintaining model performance and is 10x more computationally efficient than existing approaches.

AINeutralarXiv – CS AI · May 116/10

🧠

SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

Researchers introduce SHRED, a machine unlearning method for large language models that removes memorized private or copyrighted data without requiring a curated retain set of examples. By selectively demoting logits of high-information tokens while preserving model utility through self-distillation, SHRED achieves superior trade-offs between forgetting efficacy and performance compared to existing retain-set-dependent approaches.

AINeutralarXiv – CS AI · May 96/10

🧠

Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning

Researchers present a novel machine unlearning approach for Multimodal Large Language Models that selectively removes target visual knowledge while preserving non-target information across both visual and textual modalities. The method uses contrastive visual forgetting and null space constraints to balance effective forgetting with knowledge retention, extending applicability to continual unlearning scenarios.

AINeutralarXiv – CS AI · May 96/10

🧠

ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models

Researchers introduce ICU-Bench, a new benchmark for testing machine unlearning in multimodal AI models, addressing privacy concerns from large-scale training datasets. The benchmark reveals that current unlearning methods struggle with continuous privacy deletion requests, highlighting a critical gap between theoretical approaches and real-world deployment needs.

AINeutralarXiv – CS AI · May 96/10

🧠

SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing

Researchers propose Statistical Membership Inference (SMI), a new training-free auditing method that challenges the reliability of existing Membership Inference Attacks (MIAs) for verifying machine unlearning. The framework addresses a fundamental flaw in current auditing approaches by reformulating the problem as estimating non-member proportions in feature distributions, eliminating the need for computationally expensive shadow model training.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Orthogonal Subspace Projection for Continual Machine Unlearning via SVD-Based LoRA

Researchers propose an SVD-based orthogonal subspace projection method for continual machine unlearning that prevents interference between sequential deletion tasks in neural networks. The approach maintains model performance on retained data while effectively removing influence of unlearned data, addressing a critical limitation of naive LoRA fusion methods.

AINeutralarXiv – CS AI · Apr 136/10

🧠

TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning

Researchers propose TRU (Targeted Reverse Update), a machine unlearning framework designed to efficiently remove user data from multimodal recommendation systems without full retraining. The method addresses non-uniform data influence across ranking behavior, modality branches, and network layers through coordinated interventions, achieving better performance than existing approximate unlearning approaches.

AINeutralarXiv – CS AI · Apr 106/10

🧠

AdaProb: Efficient Machine Unlearning via Adaptive Probability

Researchers propose AdaProb, a machine unlearning method that enables trained AI models to efficiently forget specific data while preserving privacy and complying with regulations like GDPR. The approach uses adaptive probability distributions and demonstrates 20% improvement in forgetting effectiveness with 50% less computational overhead compared to existing methods.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Machine Unlearning in the Era of Quantum Machine Learning: An Empirical Study

Researchers present the first empirical study of machine unlearning in hybrid quantum-classical neural networks, adapting classical unlearning methods to quantum settings and introducing quantum-specific strategies. The study reveals that quantum models can effectively support unlearning, with performance varying based on circuit depth and entanglement structure, establishing baseline insights for privacy-preserving quantum machine learning systems.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Selective Forgetting for Large Reasoning Models

Researchers propose a new framework for 'selective forgetting' in Large Reasoning Models (LRMs) that can remove sensitive information from AI training data while preserving general reasoning capabilities. The method uses retrieval-augmented generation to identify and replace problematic reasoning segments with benign placeholders, addressing privacy and copyright concerns in AI systems.

AIBearisharXiv – CS AI · Apr 66/10

🧠

Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning

Researchers introduce VLM-UnBench, the first benchmark for evaluating training-free visual concept unlearning in Vision Language Models. The study reveals that realistic prompts fail to genuinely remove sensitive or copyrighted visual concepts, with meaningful suppression only occurring under oracle conditions that explicitly disclose target concepts.

AINeutralarXiv – CS AI · Mar 266/10

🧠

SPARE: Self-distillation for PARameter-Efficient Removal

Researchers introduce SPARE, a new machine unlearning method for text-to-image diffusion models that efficiently removes unwanted concepts while preserving model performance. The two-stage approach uses parameter localization and self-distillation to achieve selective concept erasure with minimal computational overhead.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Computation and Communication Efficient Federated Unlearning via On-server Gradient Conflict Mitigation and Expression

Researchers propose FOUL (Federated On-server Unlearning), a new framework for efficiently removing specific participants' data from federated learning models without accessing client data. The approach reduces computational and communication costs while maintaining privacy compliance through a two-stage process that performs unlearning operations on the server side.

AIBullisharXiv – CS AI · Mar 176/10

🧠

RAZOR: Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Researchers introduce RAZOR, a new framework for efficiently removing sensitive information from AI models like CLIP and Stable Diffusion without requiring full retraining. The method selectively edits specific layers and attention heads in transformer models to achieve targeted 'unlearning' while preserving overall performance.

🧠 Stable Diffusion

AIBullisharXiv – CS AI · Mar 166/10

🧠

Stake the Points: Structure-Faithful Instance Unlearning

Researchers propose a new "structure-faithful" framework for machine unlearning that preserves semantic relationships in AI models while removing specific data. The method uses semantic anchors to maintain knowledge structure, showing significant performance improvements of 19-33% across image classification, retrieval, and face recognition tasks.

AIBullisharXiv – CS AI · Mar 37/107

🧠

ROKA: Robust Knowledge Unlearning against Adversaries

Researchers introduce ROKA, a new machine unlearning method that prevents knowledge contamination and indirect attacks on AI models. The approach uses 'Neural Healing' to preserve important knowledge while forgetting targeted data, providing theoretical guarantees for knowledge preservation during unlearning.

AINeutralarXiv – CS AI · Mar 37/107

🧠

A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction

Researchers found that machine unlearning in large language models, which aims to remove specific training data influence, is less effective in interactive settings than previously thought. Knowledge that appears forgotten in static tests can often be recovered through multi-turn conversations and self-correction interactions.

Page 1 of 2Next →