RULER: Representation-Level Verification of Machine Unlearning
Researchers introduce RULER, a verification framework that detects machine unlearning failures at the representation level rather than just output metrics. The study reveals that popular unlearning methods pass traditional evaluation tests yet still retain encoded information about forgotten data in their internal representations, highlighting a critical gap in current verification protocols.
Machine unlearning—the ability to remove specific training data influence from deployed models—has become increasingly important for privacy compliance and model safety. While existing verification methods check output-level metrics like membership inference and accuracy retention, RULER exposes a fundamental blindspot: models can pass all standard tests while still internally encoding forgotten information. This represents a significant security and privacy concern, as intermediate representations may leak sensitive data through side-channel attacks or fine-tuning exploits. The research demonstrates that four established unlearning methods all fail representation-level verification despite passing output-level benchmarks, with residuals becoming more pronounced as the fraction of forgotten data increases. The oracle-free metric M4 proves particularly valuable as a diagnostic tool, requiring no retrained reference model and successfully detecting identity-level memorization in face recognition systems where conventional methods show false negatives. This work addresses a practical vulnerability in the machine learning ecosystem: organizations implementing unlearning for regulatory compliance or privacy protection may inadvertently deploy systems that retain sensitive information at the representation level. The findings suggest current standards for evaluating unlearning effectiveness are insufficient, requiring industry adoption of representation-level verification before deployment. The multi-domain validation across tabular, image, clinical, and biometric datasets establishes RULER's generalizability, making it a critical tool for ML practitioners working with sensitive data. Future focus should shift toward developing unlearning methods that pass both output and representation-level verification simultaneously.
- →RULER reveals that four popular unlearning methods pass output-level verification while failing representation-level checks, exposing a critical evaluation gap
- →The oracle-free metric M4 detects memorization without requiring model retraining, enabling practical pre-deployment diagnostics
- →Representation-level residuals grow significantly as the fraction of forgotten data increases, with statistical significance detected in 10 of 12 tested conditions
- →Face recognition models show persistent identity-level memorization despite unlearning methods claiming full data erasure
- →Current privacy compliance strategies relying on standard unlearning metrics may provide false assurance of adequate data removal