#feature-analysis News & Analysis

4 articles tagged with #feature-analysis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AINeutralarXiv – CS AI · Mar 277/10

🧠

How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

Researchers conducted the first systematic study of how weight pruning affects language model representations using Sparse Autoencoders across multiple models and pruning methods. The study reveals that rare features survive pruning better than common ones, suggesting pruning acts as implicit feature selection that preserves specialized capabilities while removing generic features.

🧠 Llama

AINeutralarXiv – CS AI · Jun 96/10

🧠

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

Query Lens extends the Logit Lens technique to improve the interpretability of sparse autoencoders by analyzing both encoder key features and decoder value features, while accounting for indirect downstream effects. The research introduces the Subspace Channel Hypothesis, suggesting that neural modules process features through layer-specific subspaces, advancing understanding of how AI models process and manipulate information.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

Researchers have developed a pre-intervention screening framework that predicts unintended side effects of sparse autoencoder (SAE) steering in language models before they occur. By analyzing feature statistics, the framework identifies which steering interventions will behave consistently and avoid disrupting unrelated features, with varying success across different model architectures.

🧠 Llama

AIBullisharXiv – CS AI · Mar 266/10

🧠

Navigating the Concept Space of Language Models

Researchers have developed Concept Explorer, a scalable interactive system for exploring features from sparse autoencoders (SAEs) trained on large language models. The tool uses hierarchical neighborhood embeddings to organize thousands of AI model features into interpretable concept clusters, enabling better discovery and analysis of how language models understand concepts.