AIBullisharXiv – CS AI · 9h ago7/10
🧠
L$^3$: Large Lookup Layers
Researchers introduce Large Lookup Layers (L³), a novel sparse architecture that generalizes embedding tables to decoder layers, enabling more efficient scaling than traditional Mixture-of-Experts models. The approach uses static token-based routing to aggregate learned embeddings contextually, achieving superior performance on language modeling tasks with up to 2.6B active parameters while maintaining hardware efficiency.