Where Can Advanced Optimization Methods Help in Deep Learning?

Seminar series

Machine Learning and Data Science Seminar

Date

Mon, 03 Jun 2024

Time

14:00 - 15:00

Location

Lecture Room 3

Speaker

James Martens

Organisation

Google Deep Mind

Modern neural network models are trained using fairly standard stochastic gradient optimizers, sometimes employing mild preconditioners.
A natural question to ask is whether significant improvements in training speed can be obtained through the development of better optimizers.

In this talk I will argue that this is impossible in the large majority of cases, which explains why this area of research has stagnated. I will go on to identify several situations where improved preconditioners can still deliver significant speedups, including exotic architectures and loss functions, and large batch training.