SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
Researchers introduce SAME, a new approach for training Multimodal Large Language Models that can continuously learn new tasks without forgetting previous capabilities. The method addresses fundamental problems in continual learning by stabilizing how AI systems route tasks to specialized expert networks and preventing knowledge degradation over time.
This research addresses a critical challenge in deploying advanced AI systems: the ability to learn new tasks continuously without catastrophically forgetting existing ones. Multimodal Large Language Models currently require complete retraining when learning new capabilities, an expensive and impractical limitation for real-world applications. SAME tackles this through a mixture-of-experts architecture that maintains task specialization while preventing the dual problems of router drift and expert drift—issues where task routing becomes unreliable and shared experts lose functionality when learning new material.
The broader context reveals growing recognition that static AI models cannot meet production demands. As organizations deploy MLLMs for grounded tasks, optical character recognition, and other specialized functions simultaneously, the ability to incrementally add capabilities becomes commercially essential. Previous sparse routing approaches failed because expert selection became unstable as data distributions shifted, and shared components were overwritten by new training tasks.
SAME's technical innovations—orthogonal subspace decomposition for routing stability and curvature-aware scaling for expert preservation—demonstrate progress toward more adaptable AI systems. The introduction of a benchmark for long task sequences signals the field's maturation toward evaluating practical continual learning scenarios. For developers building multimodal AI systems, this represents a step toward more efficient, maintainable deployments that don't require expensive periodic retraining.
The significance lies not in immediate market disruption but in enabling infrastructure that makes continuous AI capability expansion viable at scale. Organizations experimenting with continual learning pipelines should monitor this approach's adoption, as it could influence how production AI systems evolve and optimize over time.
- →SAME addresses router drift and expert drift problems in continual multimodal model training through orthogonal decomposition and curvature-aware updates.
- →The method enables task specialization while preventing knowledge degradation when learning new capabilities sequentially.
- →New benchmark for long task sequences provides evaluation framework for realistic continual learning scenarios.
- →Rehearsal-free approach reduces computational overhead by freezing selected experts during training.
- →Research advances toward production-ready continual learning for multimodal AI systems without complete model retraining.