AIBullisharXiv – CS AI · 10h ago7/10
🧠
Tapered Language Models
Researchers propose Tapered Language Models (TLMs), an architectural principle that allocates more parameters to earlier layers and fewer to later layers, contrary to the uniform allocation standard since the original transformer. Experiments across multiple model scales and architectures show this depth-aware capacity distribution improves perplexity and benchmark performance at no additional computational cost.
🏢 Perplexity