Researchers characterize the separation power of equivariant neural networks, demonstrating that non-polynomial activations like ReLU and sigmoid achieve equivalent maximum expressivity, while depth and architectural choices significantly influence a model's ability to distinguish inputs. This theoretical analysis provides a framework for comparing model expressivity and understanding the design principles behind convolutional and permutation-invariant networks.
This research addresses a fundamental question in machine learning theory: how well can different neural network architectures distinguish between different inputs? Separation power serves as a quantifiable measure of model expressivity, which is essential for understanding when and why certain architectures succeed or fail at learning tasks. The authors provide a complete mathematical characterization of which inputs become indistinguishable under specific architectural constraints, offering unprecedented clarity into neural network behavior.
The findings challenge some conventional assumptions in deep learning. The equivalence of ReLU, sigmoid, and other non-polynomial activations suggests practitioners need not obsess over activation function selection for expressivity purposes, contrary to popular belief. The observation that depth improves separation power only up to a threshold contradicts the intuition that "deeper is always better." This has practical implications for model design, suggesting that excessive depth may increase computational cost without meaningful expressivity gains.
For machine learning practitioners and researchers, this work provides actionable design principles. The hierarchy of separation power across block decompositions offers a systematic method for model comparison and selection. Understanding that invariant features don't impact separation power enables more efficient architecture design by focusing computational resources on truly expressive components. The theoretical framework enables researchers to predict expressivity limitations before training, accelerating development cycles.
These results mature the theoretical foundations of neural network design. As the field moves toward more principled architecture selection, separation power analysis becomes increasingly valuable for validating design choices and understanding fundamental limitations of different approaches.
- βNon-polynomial activations including ReLU and sigmoid are equivalent in expressivity and achieve maximum separation power
- βDepth improves separation power only until a threshold, beyond which additional layers provide no expressivity benefit
- βInvariant features added to hidden representations do not impact model separation power or expressivity
- βBlock decomposition of representations creates a hierarchy enabling direct comparison of model separation power
- βComplete characterization of indistinguishable inputs provides a framework for predicting and understanding neural network expressivity limits