AIBullisharXiv – CS AI · 15h ago6/10
🧠
More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations
Researchers propose Mixture of Activations (MoA), a novel feedforward network design that dynamically selects activation functions per token rather than applying a single fixed function across all inputs. Theoretical analysis proves MoA offers strict expressivity advantages over fixed-activation networks, while empirical testing on language models up to 2B parameters demonstrates consistent improvements in loss metrics with minimal computational overhead.