y0news
← Feed
Back to feed
🧠 AI🟢 Bullish

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

arXiv – CS AI|Swapnil Parekh||2 views
🤖AI Summary

Researchers introduce CIRCUS, a new method for discovering mechanistic circuits in AI models that addresses uncertainty and brittleness issues in current approaches. The technique creates ensemble attribution graphs and extracts consensus circuits that are 40x smaller while maintaining explanatory power, validated on Gemma-2-2B and Llama-3.2-1B models.

Key Takeaways
  • CIRCUS addresses the brittleness of current mechanistic circuit discovery methods by treating it as an uncertainty quantification problem.
  • The method produces strict-consensus circuits that are ~40x smaller than union configurations while retaining comparable explanatory power.
  • CIRCUS requires no model retraining and adds negligible computational overhead to existing attribution methods.
  • Validation through activation patching shows consensus-identified nodes consistently outperform non-consensus controls.
  • The framework provides explicit core/contingent/noise decomposition for more trustworthy and auditable AI interpretability.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles