AINeutralarXiv – CS AI · 7h ago6/10
🧠
Query Circuits: Explaining How Language Models Answer User Prompts
Researchers introduce query circuits, a method to trace how language models process specific inputs and generate outputs by identifying sparse, faithful neural pathways within the model itself. The approach achieves significant performance recovery using only 1.3% of model connections on benchmark tasks, offering more interpretable AI explanations than existing surrogate-based methods.