AINeutralarXiv – CS AI · 6h ago6/10
🧠
De-attribute to Forget for LLM Unlearning
Researchers propose DareU, a novel LLM unlearning framework that uses data attribution rewards and reinforcement learning to remove training data influence from large language models. Unlike existing approaches that maximize loss on forget sets, this method reduces attribution scores to forgotten data owners, addressing critical issues of over-forgetting and model utility degradation.