π€AI Summary
Researchers developed new activation functions for deep neural networks based on polynomial and trigonometric orthonormal bases that can successfully train models like GPT-2 and ConvNeXt. The work addresses gradient problems common with polynomial activations and shows these networks can be interpreted as multivariate polynomial mappings.
Key Takeaways
- βNew activation functions based on Hermite polynomial, Fourier trigonometric, and tropical polynomial bases can train deep neural networks effectively.
- βThe approach solves exploding and vanishing gradient problems typically associated with polynomial activations through variance-preserving initialization.
- βSuccessfully demonstrated training of GPT-2 for text prediction and ConvNeXt for image classification using these novel activations.
- βNetworks with polynomial activations can be mathematically interpreted as multivariate polynomial mappings, providing new structural insights.
- βThe activations can approximate classical ones in pre-trained models using Hermite interpolation, making them useful for fine-tuning tasks.
#neural-networks#activation-functions#deep-learning#gpt-2#convnext#polynomial#trigonometric#gradient-optimization#machine-learning
Read Original βvia arXiv β CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β you keep full control of your keys.
Related Articles