AINeutralarXiv – CS AI · 9h ago6/10
🧠
Supervised sparse auto-encoders for interpretable and compositional representations
Researchers have developed supervised sparse auto-encoders (SAEs) that improve mechanistic interpretability of neural networks by addressing non-smoothness issues in L1 penalties and aligning learned features with human semantics. Validated on Stable Diffusion 3.5, the method enables compositional generalization and feature-level interventions for semantic image editing without prompt modification.
🧠 Stable Diffusion