How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs
Researchers establish a sharp bandwidth threshold for deep Gaussian processes, proving that below this threshold compositional GPs converge to non-Gaussian, non-degenerate limit distributions rather than degenerating to constant functions. This advances theoretical understanding of deep Bayesian models and their limiting behavior as network depth increases.
This research addresses a fundamental question in deep Bayesian machine learning: what happens to the prior distribution of deep Gaussian processes as the number of layers grows arbitrarily large. Previous work identified that under certain conditions, deep GPs degenerate to trivial constant functions, rendering them useless as probabilistic models. The new contribution identifies a precise phase transition characterized by a critical bandwidth threshold $r_c(d) = \Theta(\sqrt{d})$ that scales with input dimension.
The theoretical significance lies in demonstrating that deep Gaussian processes can maintain non-trivial limiting behavior when operating below this threshold. Importantly, the discovered limit distributions exhibit non-Gaussian properties with coordinate dependence, contrasting sharply with the wide-network regime where limits are Gaussian. This finding expands the landscape of viable deep Bayesian model architectures and suggests richer expressivity than previously understood.
The empirical validation across multiple dimensions reveals complex multimodal behavior in the limit distributions, a phenomenon that becomes increasingly difficult to detect in higher dimensions. This narrow regimes suggests practitioners must carefully tune bandwidth parameters when employing deep GP architectures. The results have implications for designing more robust Bayesian deep learning models, particularly where uncertainty quantification matters.
Future work should explore whether these non-Gaussian limits offer practical advantages in downstream tasks, whether the theoretical insights transfer to other kernel choices beyond RBF, and how to efficiently sample from or approximate these complex limit distributions in practice.
- βA sharp bandwidth threshold $r_c(d) = \Theta(\sqrt{d})$ separates degenerate from non-degenerate deep GP limits
- βBelow the threshold, deep GP priors converge to non-Gaussian distributions with preserved coordinate dependence
- βThe limiting behavior becomes increasingly constrained in higher dimensions, creating narrower viable parameter regimes
- βDeep Gaussian processes can maintain non-trivial probabilistic properties at arbitrary depth under proper parameterization
- βEmpirical multimodal behavior in limit distributions suggests complex expressivity beyond standard Gaussian process theory