AIBullisharXiv – CS AI · 6h ago7/10
🧠
Leviathan: Decoupling Input and Output Representations in Language Models
Researchers introduce Leviathan, a Transformer architecture that decouples input embeddings from output projections using learned embedding vectorization (LEV), achieving 9% perplexity reduction at 1.2B parameters with minimal overhead. The approach concentrates improvements on rare tokens while requiring 2.1x fewer training tokens to match baseline performance.
🏢 Perplexity