🤖AI Summary
Researchers developed new activation functions for deep neural networks based on polynomial and trigonometric orthonormal bases that can successfully train models like GPT-2 and ConvNeXt. The work addresses gradient problems common with polynomial activations and shows these networks can be interpreted as multivariate polynomial mappings.
Key Takeaways
- →New activation functions based on Hermite polynomial, Fourier trigonometric, and tropical polynomial bases can train deep neural networks effectively.
- →The approach solves exploding and vanishing gradient problems typically associated with polynomial activations through variance-preserving initialization.
- →Successfully demonstrated training of GPT-2 for text prediction and ConvNeXt for image classification using these novel activations.
- →Networks with polynomial activations can be mathematically interpreted as multivariate polynomial mappings, providing new structural insights.
- →The activations can approximate classical ones in pre-trained models using Hermite interpolation, making them useful for fine-tuning tasks.
#neural-networks#activation-functions#deep-learning#gpt-2#convnext#polynomial#trigonometric#gradient-optimization#machine-learning
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Related Articles