AINeutralarXiv – CS AI · 7h ago6/10
🧠
Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders
Researchers investigate feature stability in sparse autoencoders (SAEs), finding that unstable features across training runs concentrate in reproducible lower-rank subspaces rather than representing pure noise. Stable features carry most functional signal for reconstruction and prediction, while unstable features have minimal individual impact but reflect shared geometric structure that different seeds resolve differently.