AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation
Researchers propose a multi-objective unlearning framework for Large Language Models that simultaneously removes hazardous information, preserves general utility, avoids over-refusal, and resists adversarial attacks. The method uses unified domain representation and bidirectional logit distillation to harmonize competing optimization goals, achieving state-of-the-art performance across diverse unlearning requirements.