AINeutralarXiv – CS AI · 3h ago6/10
🧠
ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions
Researchers introduce Residualized Sparse Autoencoders (ReSAEs), a new technique that improves how transformer models are analyzed and modified by accounting for information flow across multiple layers. By training autoencoders on residual activations rather than raw activations, ReSAEs reduce redundancy and better preserve model functionality during multi-layer interventions.