AIBullisharXiv – CS AI · 10h ago7/10
🧠
Priming: Hybrid State Space Models From Pre-trained Transformers
Researchers introduce Priming, a method that converts pre-trained Transformers into efficient Hybrid State-Space models through knowledge transfer rather than training from scratch. The technique recovers downstream performance using less than 0.5% of original pre-training tokens and enables the first large-scale comparison of SSM architectures, with Hybrid GKA 32B achieving 3.8-point reasoning improvements while delivering 2.3x faster decoding.
🧠 Llama