Subliminal Learning Is Steering Vector Distillation
Researchers demonstrate that subliminal learning—where AI models inherit unrelated traits from teacher models—occurs through steering vectors embedded in activations rather than semantic content. The findings reveal that students learn aligned vectors during fine-tuning on steered teacher outputs, explaining why this transfer fails across different model architectures and highlighting the critical role of adaptive optimizers in this process.