Neural networks are undoubtedly successful in practical applications. However complete mathematical theory of why and when machine learning algorithms based on neural networks work has been elusive. Although various representation theorems ensures the existence of the ``perfect’’ parameters of the network, it has not been proved that these perfect parameters can be (efficiently) approximated by conventional algorithms, such as the stochastic gradient descent. This problem is well known, since the arising optimisation problem is non-convex. In this talk we show how the optimization problem becomes convex in the mean field limit for one-hidden layer networks and certain deep neural networks. Moreover we present optimality criteria for the distribution of the network parameters and show that the nonlinear Langevin dynamics converges to this optimal distribution. This is joint work with Kaitong Hu, Zhenjie Ren and Lukasz Szpruch.
- Stochastic Analysis Seminar