🧠 AI🟢 BullishImportance 7/10

Token-level Data Selection for Safe LLM Fine-tuning

arXiv – CS AI|Yanping Li, Zhening Liu, Zijian Li, Zehong Lin, Jun Zhang|March 3, 2026 at 05:00 AM|6 views

🤖AI Summary

Researchers have developed TOSS, a new framework for safely fine-tuning large language models that operates at the token level rather than sample level. The method identifies and removes unsafe tokens while preserving task-specific information, demonstrating superior performance compared to existing sample-level defense methods in maintaining both safety and utility.

Key Takeaways

→Fine-tuning LLMs on custom datasets can lead to significant safety degradation in the models.
→TOSS framework uses token-level data selection to identify unsafe content with higher precision than sample-level methods.
→The method measures safety risk by comparing loss differences between safety-degraded and utility-oriented models.
→TOSS-Pro introduces progressive refinement to iteratively improve unsafe token identification.
→Experimental results show the approach maintains superior downstream task performance while ensuring model safety.