AINeutralarXiv – CS AI · 6h ago6/10
🧠
Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning
Researchers propose DualSelect, a framework for fine-tuning large language models that simultaneously selects relevant safety references and compatible task samples to preserve safety alignment while improving task performance. The method achieves significant safety improvements (5.10+ points) across models from 1B to 8B parameters without sacrificing utility.