🧠 AI⚪ NeutralImportance 6/10

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

arXiv – CS AI|Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari|May 9, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce CRAFT, a continual learning framework for large language models that prevents catastrophic forgetting by learning low-rank interventions on hidden representations rather than updating model weights. The three-stage approach uses KL divergence-based routing and merging to enable models to acquire new capabilities while maintaining performance on previously learned tasks.

Analysis

CRAFT addresses a fundamental challenge in machine learning: enabling models to learn new tasks without degrading performance on existing ones. Catastrophic forgetting occurs when fine-tuning LLMs on new data overwrites previously learned patterns, forcing expensive retraining cycles. This research proposes an elegant solution by operating in representation space rather than weight space, using low-rank interventions that function as lightweight adapters.

The technical approach builds on existing adapter-based methods like LoRA but introduces principled task routing and merging guided by output-distribution divergence. By measuring how task outputs diverge from previous knowledge, CRAFT intelligently groups related tasks and applies regularization through KL divergence. This unified framework addresses three critical components—routing, regularization, and merging—through a single coherent objective.

For AI practitioners and organizations deploying LLMs, CRAFT's implications are substantial. Continuous learning capability enables models to adapt to new domains without catastrophic performance degradation, reducing the computational cost and complexity of maintaining multiple specialized models. The framework's robustness to task ordering eliminates a common practical problem where learning sequence affects final performance.

The research demonstrates improvements across multiple model scales and benchmarks, suggesting the approach generalizes well. Future development will likely focus on scaling CRAFT to larger models and exploring how representation-space interventions interact with other adaptation techniques. Organizations considering continual learning pipelines should monitor this work's evolution, as efficient catastrophic forgetting mitigation directly impacts model economics and deployment flexibility.

Key Takeaways

→CRAFT prevents catastrophic forgetting by learning low-rank interventions on hidden representations rather than updating model weights.
→The framework uses KL divergence to guide task routing, regularization, and intervention merging through a unified objective.
→Performance improvements over strong LoRA-based baselines are demonstrated across multiple benchmarks and model scales.
→The approach is robust to task ordering, eliminating a common practical problem in continual learning scenarios.
→Efficient continual learning reduces the computational cost of maintaining specialized models for different domains.