AIBullisharXiv – CS AI · 9h ago7/10
🧠
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
Researchers introduce Memory-Efficient Looped Transformer (MELT), an architecture that decouples reasoning depth from memory consumption in recurrent language models. MELT replaces the standard approach of maintaining separate Key-Value caches per reasoning loop with a single shared cache per layer, updated via learnable gating, achieving constant-memory iterative reasoning comparable to standard LLMs while outperforming them on benchmarks.