Seminar series
Mon, 03 Oct 2022
14:00 - 15:00
L1 - tbc
Roman Novak

A common observation that wider (in the number of hidden units/channels/attention heads) neural networks perform better motivates studying them in the infinite-width limit.

Remarkably, infinitely wide networks can be easily described in closed form as Gaussian processes (GPs), at initialization, during, and after training—be it gradient-based, or fully Bayesian training. This provides closed-form test set predictions and uncertainties from an infinitely wide network without ever instantiating it (!).

These infinitely wide networks have become powerful models in their own right, establishing several SOTA results, and are used in applications including hyper-parameter selection, neural architecture search, meta learning, active learning, and dataset distillation.

The talk will provide a high-level overview of our work at Google Brain on infinite-width networks. In the first part I will derive core results, providing intuition for why infinite-width networks are GPs. In the second part I will discuss challenges and solutions to implementing and scaling up these GPs. In the third part, I will conclude with example applications made possible with infinite width networks.

The talk does not assume familiarity with the topic beyond general ML background.

Please contact us for feedback and comments about this page. Last updated on 13 Sep 2022 21:30.