AINeutralarXiv – CS AI · 11h ago7/10
🧠
Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition
Researchers introduce Hierarchical Attribution Graph Decomposition (HAGD), a novel method for extracting sparse circuits from billion-parameter language models that reduces computational complexity from exponential to polynomial time. The approach successfully identifies interpretable pathways in models ranging from GPT-2 to Llama-70B, achieving 91% behavioral preservation on modular arithmetic tasks while existing methods like ACDC become memory-prohibitive at 1.4B parameters.
🧠 Llama