#machine-unlearning News & Analysis

49 articles tagged with #machine-unlearning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

49 articles

AIBearisharXiv – CS AI · Jun 257/10

🧠

Erased, but Not Gone: Output Forgetting Is Not True Forgetting

Researchers demonstrate that machine unlearning methods that appear successful at the output layer—the standard evaluation metric—actually retain structured residual information in representation space compared to true retraining. This finding reveals a critical gap between apparent forgetting and genuine forgetting, suggesting current unlearning evaluations systematically overestimate effectiveness.

AIBullisharXiv – CS AI · Jun 107/10

🧠

SPACE: Source-free Proxy Anchor Concept Erasure for MLLMs

Researchers introduce SPACE, a source-free machine unlearning framework for multimodal large language models that removes sensitive data without access to original training data. The two-stage approach uses text-guided proxy anchors and dual-constraint semantic isolation to erase target concepts while maintaining model performance, addressing growing privacy and regulatory compliance needs.

AIBullisharXiv – CS AI · Jun 97/10

🧠

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

LoTUS is a novel machine unlearning method that removes the influence of training data from pre-trained models without requiring full retraining. The approach smooths prediction probabilities to reduce over-confidence from memorized data and introduces a new evaluation metric (RF-JSD) for real-world conditions, outperforming existing methods on large-scale datasets like ImageNet1k.

AIBearisharXiv – CS AI · Jun 27/10

🧠

Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning

Researchers introduce Mirage, a representation-level auditing framework that reveals existing machine unlearning methods in federated learning fail to truly forget sensitive data despite passing output-level tests. The study demonstrates that current approaches retain substantial class structure in internal representations, exposing a critical gap between certification standards and actual data privacy.

AINeutralarXiv – CS AI · Jun 17/10

🧠

Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack

Researchers identify two critical vulnerabilities in machine unlearning techniques: over-unlearning that damages nearby data and prototypical relearning attacks that can restore forgotten information. They propose Spotter, a new method combining masked knowledge-distillation and intra-class dispersion losses to address both security gaps in class-level unlearning.

AINeutralarXiv – CS AI · May 287/10

🧠

RULER: Representation-Level Verification of Machine Unlearning

Researchers introduce RULER, a verification framework that detects machine unlearning failures at the representation level rather than just output metrics. The study reveals that popular unlearning methods pass traditional evaluation tests yet still retain encoded information about forgotten data in their internal representations, highlighting a critical gap in current verification protocols.

AINeutralarXiv – CS AI · May 277/10

🧠

ICCU: In-Context Continual Unlearning via Pattern-Induced Refusal Rules

Researchers introduce ICCU, an in-context continual unlearning framework that removes specific data influence from language models without modifying parameters. The method uses pattern-induced refusal rules applied at inference time, addressing the inefficiency of sequential unlearning requests in production deployments.

AIBearisharXiv – CS AI · May 277/10

🧠

Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models

Researchers have developed BEAP, a black-box adversarial attack that bypasses machine unlearning safeguards in text-to-image diffusion models by generating natural-language prompts that evade detection filters. The attack achieves 60% higher success rates than previous methods while remaining undetectable to safety systems, raising critical questions about the robustness of AI model safety mechanisms.

AINeutralarXiv – CS AI · May 17/10

🧠

Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery

Researchers propose a machine unlearning framework to detect and remove neural backdoors—hidden triggers inserted during AI training that can compromise system integrity. Using model inversion and statistical analysis, the approach identifies malicious patterns and autonomously detaches machine behavior from backdoor triggers, addressing a critical cybersecurity vulnerability in AI systems.

AIBullisharXiv – CS AI · Apr 157/10

🧠

RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

Researchers introduce RePAIR, a framework enabling users to instruct large language models to forget harmful knowledge, misinformation, and personal data through natural language prompts at inference time. The system uses a training-free method called STAMP that manipulates model activations to achieve selective unlearning with minimal computational overhead, outperforming existing approaches while preserving model utility.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Explainable LLM Unlearning Through Reasoning

Researchers introduce Targeted Reasoning Unlearning (TRU), a new method for removing specific knowledge from large language models while preserving general capabilities. The approach uses reasoning-based targets to guide the unlearning process, addressing issues with previous gradient ascent methods that caused unintended capability degradation.

AINeutralarXiv – CS AI · Mar 37/104

🧠

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

Researchers identify a 'safety mirage' problem in vision language models where supervised fine-tuning creates spurious correlations that make models vulnerable to simple attacks and overly cautious with benign queries. They propose machine unlearning as an alternative that reduces attack success rates by up to 60.27% and unnecessary rejections by over 84.20%.

AIBullisharXiv – CS AI · Mar 37/102

🧠

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Researchers propose Partial Model Collapse (PMC), a novel machine unlearning method for large language models that removes private information without directly training on sensitive data. The approach leverages model collapse - where models degrade when trained on their own outputs - as a feature to deliberately forget targeted information while preserving general utility.

AINeutralarXiv – CS AI · Jun 236/10

🧠

SCRUB-FL: Sanitizing and Cleansing Representations via Unlearning of Backdoors

Researchers introduce SCRUB-FL, a post-training defense mechanism against backdoor attacks in federated learning systems that reduces attack success rates to 3.88% while preserving model accuracy. The method uses spectral analysis and machine unlearning to remove trigger-target associations without requiring prior knowledge of attack patterns or clean datasets.

AIBullisharXiv – CS AI · Jun 236/10

🧠

OFMU: Optimization-Driven Framework for Machine Unlearning

Researchers propose OFMU, a bi-level optimization framework designed to enable large language models to selectively unlearn specific data without full retraining, addressing privacy and regulatory compliance needs. The method balances forgetting targeted information while maintaining model performance through hierarchical optimization with theoretical convergence guarantees.

AINeutralarXiv – CS AI · Jun 126/10

🧠

MLUBench: A Benchmark for Lifelong Unlearning Evaluation in MLLMs

Researchers introduce MLUBench, a large-scale benchmark for evaluating lifelong unlearning in multimodal large language models (MLLMs), revealing that existing methods suffer from cumulative degradation. The study identifies a unique challenge in MLLM unlearning: removing data from one modality can damage the model's multimodal alignment, and proposes LUMoE as a solution to mitigate this degradation.

AIBullisharXiv – CS AI · Jun 116/10

🧠

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

Researchers introduce ASRU, a machine unlearning framework for multimodal large language models that balances removing sensitive information with maintaining generation quality. The approach uses activation steering and reinforcement learning to achieve superior unlearning effectiveness while preserving model utility, demonstrating significant improvements on Qwen3-VL.

AINeutralGoogle Research Blog · Jun 106/10

🧠

New framework for auditing machine unlearning

Researchers have developed a new framework for auditing machine unlearning systems, establishing standardized methods to verify that AI models can effectively forget specific data. This advancement addresses growing regulatory and ethical requirements around data removal and privacy compliance in machine learning.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models

Researchers propose TRACE, a novel machine unlearning technique designed specifically for Mixture-of-Experts language models that addresses the problem of forget-critical experts receiving insufficient regularization during the unlearning process. The method achieves 9% relative utility improvements by detecting and calibrating expert activation patterns to match forget and retain data distributions, demonstrating consistent performance gains across multiple MoE architectures.

AINeutralarXiv – CS AI · Jun 96/10

🧠

TRACER: Token ReAssignment for Concept ERasure in Generative Recommendation

Researchers introduce TRACER, a novel framework for removing sensitive concepts from generative recommendation systems while preserving overall utility. The method uses token reassignment to handle the unique challenge that semantic IDs in recommendation systems are shared across items to forget and retain, unlike discrete tokens in language models.

AINeutralarXiv – CS AI · Jun 86/10

🧠

REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference

Researchers introduce REMEDI, a benchmark for evaluating machine unlearning methods in clinical disease inference using real patient data from MIMIC-III. The study reveals fundamental trade-offs between model utility and data removal effectiveness, with existing unlearning techniques proving poorly suited for multi-label medical classification tasks.

AINeutralarXiv – CS AI · Jun 86/10

🧠

On the importance of multiple training seeds for evaluating machine unlearning

A new study reveals that evaluating machine unlearning algorithms requires multiple training seeds, not just multiple unlearning seeds from a single trained model, as unlearning performance varies significantly based on initial training conditions. This finding challenges current evaluation practices in machine unlearning research across image classification, federated learning, and large language models.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance

Researchers introduce Alternating Token-Weighted Unlearning (ATWU), a new method for removing specific knowledge from language models while maintaining their general capabilities. The approach identifies which tokens are most relevant for forgetting by measuring conflict with model retention objectives, achieving state-of-the-art results without requiring external supervision or auxiliary models.

AINeutralarXiv – CS AI · Jun 46/10

🧠

ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models

Researchers introduce ZeroUnlearn, a novel machine unlearning framework that efficiently removes sensitive information from large language models through knowledge re-mapping and representational orthogonality, rather than expensive retraining. The method preserves overall model utility while selectively unlearning harmful data in few-shot settings, addressing critical privacy and safety concerns in LLMs.

AINeutralarXiv – CS AI · Jun 26/10

🧠

SUPREME: A Multi-GPU Framework for Reproducible Image Unlearning Method Evaluation

SUPREME is an open-source framework that accelerates machine unlearning evaluation by distributing computation across multiple GPUs, addressing a critical bottleneck in AI model evaluation. The framework enables reproducible testing of data removal methods at scale, which has implications for privacy-preserving AI development and regulatory compliance.

Page 1 of 2Next →