AINeutralarXiv – CS AI · 6h ago6/10
🧠
How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural
Researchers measured the actual linearity of transformer feed-forward network blocks across multiple language models, finding that linearity varies dramatically between adjacent blocks and is learned during training rather than determined by architecture. This discovery enables targeted compression strategies and reveals methodological issues in evaluating transformer models.
🏢 Perplexity