#llm-unlearning News & Analysis

6 articles tagged with #llm-unlearning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AINeutralarXiv – CS AI · May 127/10

🧠

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning

Researchers identify critical honesty failures in Large Language Model unlearning methods, where models hallucinate or behave inconsistently after attempting to forget harmful training data. They propose ReVa, a representation-alignment procedure that significantly improves model honesty by better acknowledging forgotten knowledge while maintaining utility on retained information.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning

Researchers introduce DiPO (Distribution Preference Optimization), a novel algorithm for LLM unlearning that operates at the token distribution level rather than full response level. The method addresses limitations in existing approaches like NPO by constructing preference signals through selective amplification of model logits, achieving superior performance on benchmark tests while maintaining model utility.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning

Researchers introduce NSRU (Null-Space Constrained Response-Specified Unlearning), a novel framework for controlling what large language models forget while preserving their general capabilities. The method uses low-rank adaptation constrained to null spaces of retain subspaces, enabling precise suppression of undesired knowledge with specified replacement responses while maintaining model utility on benign tasks.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Standard vs. Modular Sampling: Best Practices for Reliable LLM Unlearning

Researchers challenge conventional LLM unlearning practices by demonstrating that single neighbor sets and standard 1:1 sampling methods are suboptimal for removing knowledge while preserving model utility. The study proposes Modular Entity-Level Unlearning (MELU) as a more effective alternative, establishing new best practices for reliable AI model unlearning.

AINeutralarXiv – CS AI · Jun 16/10

🧠

De-attribute to Forget for LLM Unlearning

Researchers propose DareU, a novel LLM unlearning framework that uses data attribution rewards and reinforcement learning to remove training data influence from large language models. Unlike existing approaches that maximize loss on forget sets, this method reduces attribution scores to forgotten data owners, addressing critical issues of over-forgetting and model utility degradation.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation

Researchers propose a multi-objective unlearning framework for Large Language Models that simultaneously removes hazardous information, preserves general utility, avoids over-refusal, and resists adversarial attacks. The method uses unified domain representation and bidirectional logit distillation to harmonize competing optimization goals, achieving state-of-the-art performance across diverse unlearning requirements.