Date
Tue, 01 Nov 2022
Time
12:30 - 13:00
Location
C3
Speaker
Alain Rossier

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation (SDE) or neither of these. Furthermore, we are able to formally prove the linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We further prove that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite 2-variation.

Please contact us with feedback and comments about this page. Last updated on 28 Oct 2022 13:00.