Date
Tue, 21 Nov 2023
Time
16:00 - 17:00
Location
L6
Speaker
Thiziri Nait Saada
Organisation
Mathematical Institute (University of Oxford)

The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that enables a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews (2018) to a larger class of initial weight distributions (which we call "pseudo i.i.d."), including the established cases of i.i.d. and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with pseudo i.i.d. distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.

Last updated on 8 Nov 2023, 10:10am. Please contact us with feedback and comments about this page.