AINeutralarXiv – CS AI · 6h ago6/10
🧠
Interactions Between Crosscoder Features: A Compact Proofs Perspective
Researchers introduce a framework using compact proofs to measure feature interactions in crosscoders and Sparse Autoencoders, revealing that interactions between learned features cause reconstruction errors. The work demonstrates practical applications including computationally sparse models that maintain 60% performance with minimal features and detection of sleeper agent behavior in AI systems.