AINeutralarXiv โ CS AI ยท 8h ago7/10
๐ง
Do Sparse Autoencoders Capture Concept Manifolds?
Researchers demonstrate that sparse autoencoders (SAEs) capture semantic concepts along low-dimensional manifolds rather than isolated linear directions, revealing that existing architectures suboptimally recover these continuous structures through a fragmented approach called dilution. The findings suggest future interpretability methods should treat geometric objects as fundamental units rather than individual feature directions.