AIBullisharXiv โ CS AI ยท 9h ago7/10
๐ง
Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
Researchers developed a new framework to remove backdoors from large language models without prior knowledge of triggers or clean reference models. The method uses an immunization-inspired approach that creates synthetic backdoored variants to identify and neutralize malicious components while preserving the model's generative capabilities.