y0news
#model-analysis1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago2
๐Ÿง 

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

Researchers introduce CIRCUS, a new method for discovering mechanistic circuits in AI models that addresses uncertainty and brittleness issues in current approaches. The technique creates ensemble attribution graphs and extracts consensus circuits that are 40x smaller while maintaining explanatory power, validated on Gemma-2-2B and Llama-3.2-1B models.