Date
Mon, 03 Jun 2024
Time
14:00 - 15:00
Location
Lecture Room 3
Speaker
James Martens
Organisation
Google Deep Mind

Modern neural network models are trained using fairly standard stochastic gradient optimizers, sometimes employing mild preconditioners. 
A natural question to ask is whether significant improvements in training speed can be obtained through the development of better optimizers. 

In this talk I will argue that this is impossible in the large majority of cases, which explains why this area of research has stagnated. I will go on to identify several situations where improved preconditioners can still deliver significant speedups, including exotic architectures and loss functions, and large batch training. 

Last updated on 3 Apr 2024, 10:28am. Please contact us with feedback and comments about this page.