#activation-functions News & Analysis

6 articles tagged with #activation-functions. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · Mar 37/104

🧠

Polynomial, trigonometric, and tropical activations

Researchers developed new activation functions for deep neural networks based on polynomial and trigonometric orthonormal bases that can successfully train models like GPT-2 and ConvNeXt. The work addresses gradient problems common with polynomial activations and shows these networks can be interpreted as multivariate polynomial mappings.

AIBullisharXiv – CS AI · 5d ago6/10

🧠

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

Researchers propose Mixture of Activations (MoA), a novel feedforward network design that dynamically selects activation functions per token rather than applying a single fixed function across all inputs. Theoretical analysis proves MoA offers strict expressivity advantages over fixed-activation networks, while empirical testing on language models up to 2B parameters demonstrates consistent improvements in loss metrics with minimal computational overhead.

AINeutralarXiv – CS AI · May 126/10

🧠

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

Researchers empirically validate theoretical predictions about feature repulsion in neural network grokking, discovering that while the mathematical sign structure holds consistently across activation functions, the spectral signature of this mechanism in weight updates depends critically on activation type—appearing sharply in quadratic activations but remaining invisible in ReLU networks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks

Researchers introduce PolyGLU, a new transformer architecture that enables dynamic routing among multiple activation functions, mimicking biological neural diversity. The 597M-parameter PolychromaticLM model shows emergent specialization patterns and achieves strong performance despite training on significantly fewer tokens than comparable models.

🏢 Nvidia

AIBullisharXiv – CS AI · Mar 27/1016

🧠

Activation Function Design Sustains Plasticity in Continual Learning

Researchers from arXiv demonstrate that activation function design is crucial for maintaining neural network plasticity in continual learning scenarios. They introduce two new activation functions (Smooth-Leaky and Randomized Smooth-Leaky) that help prevent models from losing their ability to adapt to new tasks over time.

$LINK

AIBullisharXiv – CS AI · Feb 276/108

🧠

GRAU: Generic Reconfigurable Activation Unit Design for Neural Network Hardware Accelerators

Researchers propose GRAU, a new reconfigurable activation unit design for neural network hardware accelerators that uses piecewise linear fitting with power-of-two slopes. The design reduces LUT consumption by over 90% compared to traditional multi-threshold activators while supporting mixed-precision quantization and nonlinear functions.