y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Beyond Transfer Accuracy: Faithful Circuits for Controlled Low-Resource Adaptation

arXiv – CS AI|Khumaisa Nur'aini, Ayu Purwarianti, Alham Fikri Aji, Derry Wijaya|
🤖AI Summary

Researchers introduce a counterfactual-free circuit discovery method adapted for unstructured natural text, enabling Circuit-Targeted Supervised Fine-Tuning (CT-SFT) that improves low-resource model adaptation while preserving performance on source tasks and preventing catastrophic forgetting.

Analysis

This research addresses a fundamental challenge in machine learning: adapting large language models to new tasks with minimal data while maintaining their existing capabilities. Traditional circuit discovery methods have relied on templated tasks with carefully constructed counterfactuals, restricting their practical applicability. The authors' adaptation of Contextual Decomposition for Transformers enables circuit identification in real-world, unstructured text by using label-balanced activation means and task-directional relevance scoring—eliminating the need for synthetic counterfactuals entirely.

The innovation lies in how circuits are leveraged for targeted fine-tuning. Rather than updating all model parameters globally, CT-SFT restricts updates to causally relevant attention heads and LayerNorm components. This mechanistic approach contrasts sharply with conventional sparse fine-tuning, which often recruits additional model capacity to achieve similar accuracy levels. The distinction matters significantly for model robustness.

Experimental results on cross-lingual sentiment transfer (NusaX) and natural language inference (XNLI) demonstrate that CT-SFT achieves competitive accuracy in low-resource settings while uniquely preserving performance on source languages and related tasks. While full fine-tuning and non-circuit sparse methods occasionally match target accuracy, they frequently suffer catastrophic forgetting—degrading the model's original knowledge. CT-SFT avoids this trade-off through causal grounding, updating only components directly responsible for the target task.

This work has implications for practitioners deploying models across multiple languages and tasks simultaneously, particularly in resource-constrained environments. The approach suggests that understanding model circuits—the computational pathways underlying specific behaviors—enables more reliable adaptation strategies than capacity-based alternatives.

Key Takeaways
  • CT-SFT achieves competitive low-resource adaptation accuracy while minimizing catastrophic forgetting of source-task knowledge.
  • Circuit discovery is now viable on unstructured natural text without synthetic counterfactuals, broadening applicability beyond templated tasks.
  • Targeting parameter updates to task-relevant circuits provides safer adaptation than global fine-tuning or general sparse methods.
  • Results hold across multiple language pairs and tasks (NusaX, XNLI), indicating generalization beyond single-domain applications.
  • Mechanistic interpretability through circuits offers a causally grounded alternative to capacity recruitment for model adaptation.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles