SCRUB-FL: Sanitizing and Cleansing Representations via Unlearning of Backdoors
Researchers introduce SCRUB-FL, a post-training defense mechanism against backdoor attacks in federated learning systems that reduces attack success rates to 3.88% while preserving model accuracy. The method uses spectral analysis and machine unlearning to remove trigger-target associations without requiring prior knowledge of attack patterns or clean datasets.
Federated learning has emerged as a critical infrastructure for privacy-preserving machine learning, enabling multiple parties to train models collaboratively without exposing raw data. However, this decentralization creates security vulnerabilities where malicious participants can inject backdoors—hidden triggers that manipulate model predictions for specific inputs. SCRUB-FL addresses a fundamental gap in existing defenses by operating post-convergence, when traditional aggregation-time protections have already failed to eliminate embedded backdoor behaviors.
The technical approach represents a meaningful advance in adversarial robustness. Rather than detecting or filtering poisoned clients during training, SCRUB-FL employs a two-stage strategy: clients identify suspicious samples through statistical analysis and train lightweight generative models that capture trigger distributions, then aggregates these representations server-side. After training concludes, the server synthesizes trigger-like samples and applies machine unlearning to force the model toward uniform predictions, effectively erasing learned associations without catastrophic forgetting.
For enterprises deploying federated learning across sensitive domains—healthcare, finance, autonomous systems—backdoor resilience directly impacts operational security and liability exposure. The method's effectiveness against 40% malicious participation rates demonstrates robustness under realistic adversarial conditions. The approach's advantage over prior work lies in requiring neither trigger knowledge nor external clean datasets, reducing deployment friction.
Developers and security teams should monitor whether SCRUB-FL's unlearning methodology becomes integrated into major federated learning frameworks. The research validates that post-training sanitization can achieve comparable robustness to early-detection methods, potentially shifting how distributed ML systems are architected. Broader adoption depends on computational overhead at inference and scalability to production-scale models.
- →SCRUB-FL reduces backdoor attack success rates to 3.88% while maintaining over 91% normal accuracy on benchmarks.
- →Post-convergence machine unlearning can eliminate trigger-target associations without prior knowledge of attack patterns.
- →The method requires only lightweight WGAN-GP models on clients, reducing computational burden compared to alternative defenses.
- →Effectiveness demonstrated against 40% malicious participation, suggesting robustness in high-adversary environments.
- →No requirement for clean proxy datasets or trigger pattern knowledge shifts practical deployment feasibility.