AIBullisharXiv โ CS AI ยท 6h ago2
๐ง
CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles
Researchers introduce CIRCUS, a new method for discovering mechanistic circuits in AI models that addresses uncertainty and brittleness issues in current approaches. The technique creates ensemble attribution graphs and extracts consensus circuits that are 40x smaller while maintaining explanatory power, validated on Gemma-2-2B and Llama-3.2-1B models.