#catastrophic-forgetting News & Analysis

56 articles tagged with #catastrophic-forgetting. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

56 articles

AINeutralarXiv – CS AI · Jun 26/10

🧠

Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization

Researchers introduce Foundation Preserving LoRA (FoLoRA), a new optimization framework that addresses a critical challenge in fine-tuning foundation models: maintaining pre-trained capabilities while adapting to specialized downstream tasks. Using a generalized Rayleigh-quotient approach, FoLoRA intelligently balances task performance gains against knowledge forgetting during training.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Task diversity produces systematic transfer but inhibits continual reinforcement learning

Researchers introduce Banyan, a benchmark for studying continual reinforcement learning that reveals task diversity improves immediate transfer between tasks but fails to sustain learning across multiple distribution shifts. While agents trained on diverse tasks generalize well to new task distributions, they forget earlier tasks and struggle with longer-horizon objectives as training continues.

AINeutralarXiv – CS AI · Jun 26/10

🧠

AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training

Researchers introduce AlphaToken, a framework that improves large language model post-training by valuating individual response tokens based on their contribution to both task adaptation and preservation of pre-trained knowledge. The method uses gradient-based signals and a Fisher-drift proxy to identify high-value tokens, enabling more efficient fine-tuning and preference optimization while reducing catastrophic forgetting.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

Researchers introduce AdvCL, a novel framework that repurposes adversarial perturbations to improve continual learning in large language models by addressing forgetting, limited transfer, and adversarial vulnerability. The approach combines three modules—Intra-Smooth, Proto-Clip, and Inter-Align—to provide geometric control signals that stabilize model adaptation across sequential tasks while maintaining robustness.

AINeutralarXiv – CS AI · May 296/10

🧠

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Researchers demonstrate that reinforcement learning (RL) preserves internal computational circuits in large language models better than supervised fine-tuning (SFT) during task adaptation. Using a new metric called differential circuit vulnerability on Qwen2.5-3B-Instruct, they reveal a mechanistic trade-off: SFT adapts faster but causes substantial circuit disruption and capability forgetting, while RL maintains base model circuits at the cost of slower learning.

AIBullisharXiv – CS AI · May 296/10

🧠

TRACER: Persistent Regularization for Robust Multimodal Finetuning

Researchers introduce TRACER, a novel finetuning method for multimodal AI models that addresses catastrophic forgetting and out-of-distribution robustness degradation. By replacing standard Exponential Moving Average teachers with Weighted Moving Average teachers and combining contrastive learning with multi-perspective distillation, the approach demonstrates consistent performance gains across CLIP backbone architectures without hyperparameter sensitivity.

AINeutralarXiv – CS AI · May 286/10

🧠

SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning

Researchers introduce SAME, a new approach for training Multimodal Large Language Models that can continuously learn new tasks without forgetting previous capabilities. The method addresses fundamental problems in continual learning by stabilizing how AI systems route tasks to specialized expert networks and preventing knowledge degradation over time.

AINeutralarXiv – CS AI · May 276/10

🧠

Beyond Transfer Accuracy: Faithful Circuits for Controlled Low-Resource Adaptation

Researchers introduce a counterfactual-free circuit discovery method adapted for unstructured natural text, enabling Circuit-Targeted Supervised Fine-Tuning (CT-SFT) that improves low-resource model adaptation while preserving performance on source tasks and preventing catastrophic forgetting.

AINeutralarXiv – CS AI · May 126/10

🧠

UFO: A Unified Flow-Oriented Framework for Robust Continual Graph Learning

Researchers introduce UFO, a framework addressing robust continual graph learning by simultaneously tackling catastrophic forgetting and noisy data supervision in evolving graphs. The method uses flow-based generative modeling to mitigate forgetting and instance-level reliability scoring to handle corrupted labels, demonstrating superior performance across benchmark datasets.

AINeutralarXiv – CS AI · May 96/10

🧠

HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning

Researchers introduce HEDP, a domain incremental learning framework that enables AI models to adapt to new data domains without retraining by combining energy-based regularization with distance-based weighting mechanisms. The approach demonstrates a 2.57% accuracy improvement on unseen domains while reducing catastrophic forgetting, addressing a critical challenge in continuous learning systems.

AINeutralarXiv – CS AI · May 96/10

🧠

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

Researchers introduce CRAFT, a continual learning framework for large language models that prevents catastrophic forgetting by learning low-rank interventions on hidden representations rather than updating model weights. The three-stage approach uses KL divergence-based routing and merging to enable models to acquire new capabilities while maintaining performance on previously learned tasks.

AINeutralarXiv – CS AI · May 96/10

🧠

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Researchers demonstrate that using the same optimizer during both pretraining and finetuning of large language models reduces catastrophic forgetting while maintaining or improving task performance. This "optimizer-model consistency" effect suggests optimizers create regularization patterns that preserve learned knowledge, with implications for efficient model adaptation strategies.

AINeutralarXiv – CS AI · May 96/10

🧠

Keep Rehearsing and Refining: Lifelong Learning Vehicle Routing under Continually Drifting Tasks

Researchers propose DREE, a novel lifelong learning framework for neural vehicle routing problem solvers that handles continually drifting task patterns with limited training resources per task. The approach addresses a gap in existing methods by managing catastrophic forgetting while learning sequential tasks in real-world logistics scenarios where problem patterns shift over time.

AIBullisharXiv – CS AI · Apr 206/10

🧠

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

Researchers introduce JumpLoRA, a novel framework that uses sparse adapters with JumpReLU gating to enable continual learning in large language models while mitigating catastrophic forgetting. The method dynamically isolates parameters across tasks, outperforming existing state-of-the-art approaches like ELLA and significantly improving IncLoRA performance.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

Researchers propose Joint Flashback Adaptation, a novel method to address catastrophic forgetting in large language models during incremental task learning. The approach uses limited prompts from previous tasks combined with latent task interpolation, demonstrating improved performance across 1000+ instruction-following and reasoning tasks without requiring full replay data.

AINeutralarXiv – CS AI · Apr 146/10

🧠

If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

Researchers introduce LIFESTATE-BENCH, a benchmark for evaluating lifelong learning capabilities in large language models through multi-turn interactions using narrative datasets like Hamlet. Testing shows nonparametric approaches significantly outperform parametric methods, but all models struggle with catastrophic forgetting over extended interactions, revealing fundamental limitations in LLM memory and consistency.

🧠 GPT-4🧠 Llama

AIBullisharXiv – CS AI · Apr 146/10

🧠

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

Researchers present Data Mixing Agent, an AI framework that uses reinforcement learning to automatically optimize how large language models balance training data from source and target domains during continual pre-training. The approach outperforms manual reweighting strategies while generalizing across different models, domains, and fields without requiring retraining.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

Researchers propose DeLL, a new framework for autonomous driving systems that addresses lifelong learning challenges through dynamic knowledge spaces and causal inference mechanisms. The system uses Dirichlet process mixture models to prevent catastrophic forgetting and improve adaptability to new driving scenarios while maintaining previously learned knowledge.

AIBullisharXiv – CS AI · Mar 176/10

🧠

CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Researchers introduce CATFormer, a new spiking neural network architecture that solves catastrophic forgetting in continual learning through dynamic threshold neurons. The framework uses context-adaptive thresholds and task-agnostic inference to maintain knowledge across multiple learning tasks without performance degradation.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

This comprehensive survey examines continual learning methodologies for large language models, focusing on three core training stages and methods to mitigate catastrophic forgetting. The research reveals that while current approaches show promise in specific domains, fundamental challenges remain in achieving seamless knowledge integration across diverse tasks and temporal scales.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives

Researchers developed UNIFIER, a continual learning framework for multimodal large language models (MLLMs) to adapt to changing visual scenarios without catastrophic forgetting. The framework addresses visual discrepancies across different environments like high-altitude, underwater, low-altitude, and indoor scenarios, showing significant improvements over existing methods.

🏢 Hugging Face

AIBullisharXiv – CS AI · Mar 126/10

🧠

Gated Adaptation for Continual Learning in Human Activity Recognition

Researchers developed a new continual learning framework for human activity recognition (HAR) in IoT wearable devices that prevents AI models from forgetting previous tasks when learning new ones. The method uses gated adaptation to achieve 77.7% accuracy while reducing forgetting from 39.7% to 16.2%, training only 2% of parameters.

AIBullisharXiv – CS AI · Mar 116/10

🧠

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

Researchers propose MSSR (Memory-Inspired Sampler and Scheduler Replay), a new framework for continual fine-tuning of large language models that mitigates catastrophic forgetting while maintaining adaptability. The method estimates sample-level memory strength and schedules rehearsal at adaptive intervals, showing superior performance across three backbone models and 11 sequential tasks compared to existing replay-based strategies.

AIBullisharXiv – CS AI · Mar 36/108

🧠

IDER: IDempotent Experience Replay for Reliable Continual Learning

Researchers propose IDER (Idempotent Experience Replay), a new continual learning method that addresses catastrophic forgetting in neural networks while improving prediction reliability. The approach uses idempotent properties to help AI models retain previously learned knowledge when acquiring new tasks, with demonstrated improvements in accuracy and reduced computational overhead.

AIBullisharXiv – CS AI · Mar 36/109

🧠

Surgical Post-Training: Cutting Errors, Keeping Knowledge

Researchers introduce Surgical Post-Training (SPoT), a new method to improve Large Language Model reasoning while preventing catastrophic forgetting. SPoT achieved 6.2% accuracy improvement on Qwen3-8B using only 4k data pairs and 28 minutes of training, offering a more efficient alternative to traditional post-training approaches.

← PrevPage 2 of 3Next →