Researchers discovered that continuous-time RNNs trained with noise injected inside activation functions paradoxically perform best when noise remains present at test time, contradicting conventional assumptions about noise removal. This phenomenon stems from noise-induced shifts in neural network dynamics that become computationally integrated into learned representations, revealing that networks can overfit to training noise itself rather than just input-output mappings.
This research challenges a fundamental assumption in neural network training: that noise serves only as a regularization tool to be discarded during deployment. The study reveals that RNNs don't simply tolerate training noise—they actively incorporate it into their computational mechanisms. When noise is injected inside activation functions, networks develop dependencies on specific noise levels, with performance degrading when that noise is removed at inference time. This occurs because noise asymmetrically attenuates near neural activation nonlinearities, causing systematic bias shifts in network dynamics that become functionally necessary for optimal output. The phenomenon represents a form of overfitting distinct from traditional data overfitting, where networks learn noise patterns rather than generalizable input-output relationships. The research demonstrates this occurs across multiple architectures and tasks, indicating it's a robust property of how RNNs learn under noisy conditions. This finding has significant implications for biological neural network modeling, since many RNN architectures are designed to emulate brain function where biological noise is ubiquitous. For AI practitioners, the results suggest that noise injection location and magnitude during training directly influence deployment requirements, necessitating careful calibration of noise parameters rather than assuming noise should be minimized. The work also provides theoretical grounding through analysis of fixed-point dynamics, explaining why networks with noise outside activation functions don't exhibit this preference. Future research should explore whether similar phenomena occur in modern deep learning architectures and how to design networks resilient to noise-level changes during deployment.
- →RNNs trained with internal noise often perform optimally when noise remains at inference time, contradicting standard assumptions about noise removal.
- →Networks can overfit to training noise itself, not just input-output data, through noise-induced shifts in neural dynamics and fixed points.
- →Noise injection location matters critically: internal noise causes preference for training noise levels while external noise does not.
- →The phenomenon arises from asymmetric noise attenuation near activation function nonlinearities where networks preferentially operate.
- →Findings have implications for biological neural network modeling and require rethinking noise management in deployed RNN systems.