AIBullisharXiv – CS AI · 5h ago7/10
🧠
Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers
Researchers introduce ViSAE, a mechanistic interpretability toolbox that uses neuroscience-inspired principles to decode how Vision Transformers make decisions through human-interpretable concept circuits. The method achieves significant improvements in model auditing and steering, with concept editing improving worst-group accuracy by 48.2% on benchmark tests, addressing critical safety concerns before ViT deployment.