AIBullisharXiv – CS AI · Mar 177/10
🧠
Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
Researchers developed a new framework to remove backdoors from large language models without prior knowledge of triggers or clean reference models. The method uses an immunization-inspired approach that creates synthetic backdoored variants to identify and neutralize malicious components while preserving the model's generative capabilities.