Date
Mon, 03 Jun 2024
Time
14:00 - 15:00
Location
Lecture Room 3
Speaker
James Martens
Organisation
Google Deep Mind

Modern neural network models are trained using fairly standard stochastic gradient optimizers, sometimes employing mild preconditioners. 
A natural question to ask is whether significant improvements in training speed can be obtained through the development of better optimizers. 

In this talk I will argue that this is impossible in the large majority of cases, which explains why this area of research has stagnated. I will go on to identify several situations where improved preconditioners can still deliver significant speedups, including exotic architectures and loss functions, and large batch training. 

Please contact us with feedback and comments about this page. Last updated on 03 Apr 2024 10:28.