AINeutralarXiv โ CS AI ยท 5h ago
๐ง
Circuit Insights: Towards Interpretability Beyond Activations
Researchers introduce WeightLens and CircuitLens, two new methods for analyzing neural network interpretability that go beyond traditional activation-based approaches. These tools aim to provide more systematic and scalable analysis of neural network circuits by interpreting features directly from weights and capturing feature interactions.