βBack to feed
π§ AIβͺ NeutralImportance 7/10
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
π€AI Summary
Researchers conducted the first systematic study of how weight pruning affects language model representations using Sparse Autoencoders across multiple models and pruning methods. The study reveals that rare features survive pruning better than common ones, suggesting pruning acts as implicit feature selection that preserves specialized capabilities while removing generic features.
Key Takeaways
- βRare SAE features with low firing rates survive pruning significantly better than frequent features across most experimental conditions.
- βWanda pruning preserves feature structure up to 3.7x better than magnitude pruning methods.
- βPre-trained SAEs remain viable on Wanda-pruned models up to 50% sparsity levels.
- βPruning acts as implicit feature selection, preferentially destroying high-frequency generic features while preserving specialized rare ones.
- βGeometric feature survival does not predict causal importance, revealing a key dissociation for interpretability research.
Mentioned in AI
Models
LlamaMeta
#language-models#pruning#sparse-autoencoders#model-compression#interpretability#feature-analysis#gemma#llama#research
Read Original βvia arXiv β CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β you keep full control of your keys.
Related Articles