Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning
Researchers introduce Mirage, a representation-level auditing framework that reveals existing machine unlearning methods in federated learning fail to truly forget sensitive data despite passing output-level tests. The study demonstrates that current approaches retain substantial class structure in internal representations, exposing a critical gap between certification standards and actual data privacy.
The Mirage framework addresses a fundamental vulnerability in machine unlearning research by exposing that certification methods relying solely on output-level metrics provide false assurance of data deletion. When visual models undergo unlearning procedures, their external behavior may change while internal representations preserve the original training data's patterns. This distinction matters significantly because federated learning systems handle sensitive information across distributed networks, and inadequate forgetting mechanisms could compromise user privacy despite apparent compliance with unlearning protocols.
Vertical Federated Learning has emerged as a privacy-preserving approach where different parties contribute different features to the same training samples. The sector has grown because organizations seek to collaborate on machine learning without exposing raw data. However, the Mirage study reveals an unlearning trilemma: no existing method achieves high utility, output-level forgetting, and representation-level forgetting simultaneously. This creates a fundamental tradeoff that researchers and practitioners must navigate.
The asymmetry between class-level and sample-level forgetting proves particularly concerning, with class information persisting strongly across network layers even after unlearning. For developers deploying federated learning systems, this research necessitates immediate evaluation of current unlearning implementations against representation-level standards rather than relying on traditional output metrics. Organizations processing sensitive data must demand transparency about how their information is being forgotten at deeper architectural levels.
- →Existing federated unlearning methods retain up to 15.4 points higher class structure recovery than retrained baselines despite passing output-level tests
- →The unlearning trilemma shows no method simultaneously achieves utility, output forgetting, and representation forgetting—requiring fundamental architectural tradeoffs
- →Class-level unlearning leaves 97% representational traces while sample-level forgetting approaches random chance, exposing asymmetric privacy guarantees
- →Mirage's four diagnostics (LPR, CKA, separability scoring, layer-wise analysis) establish representation-level evaluation as necessary for privacy certification
- →Current industry standards for machine unlearning validation are insufficient and require immediate update to address representation-level data persistence