Thu, 13 Oct 2022

16:00 - 17:00
L3

MF-OMO: An Optimization Formulation of Mean-Field Games

Anran Hu
Abstract

Theory of mean-field games (MFGs) has recently experienced an exponential growth. Existing analytical approaches to find Nash equilibrium (NE) solutions for MFGs are, however, by and large restricted to contractive or monotone settings, or rely on the uniqueness of the NE. We propose a new mathematical paradigm to analyze discrete-time MFGs without any of these restrictions. The key idea is to reformulate the problem of finding NE solutions in MFGs as solving an equivalent optimization problem, called MF-OMO (Mean-Field Occupation Measure Optimization), with bounded variables and trivial convex constraints. It is built on the classical work of reformulating a Markov decision process as a linear program, and by adding the consistency constraint for MFGs in terms of occupation measures, and by exploiting the complementarity structure of the linear program. This equivalence framework enables finding multiple (and possibly all) NE solutions of MFGs by standard algorithms such as projected gradient descent, and with convergence guarantees under appropriate conditions. In particular, analyzing MFGs with linear rewards and with mean-field independent dynamics is reduced to solving a finite number of linear programs, hence solvable in finite time. This optimization reformulation of MFGs can be extended to variants of MFGs such as personalized MFGs.

Mon, 14 Nov 2022
14:00
L4

A dynamical system perspective of optimization in data science

Jalal Fadili
(CNRS-ENSICAEN-Université Caen)
Abstract

In this talk, I will discuss and introduce deep insight from the dynamical system perspective to understand the convergence guarantees of first-order algorithms involving inertial features for convex optimization in a Hilbert space setting.

Such algorithms are widely popular in various areas of data science (data processing, machine learning, inverse problems, etc.).
They can be viewed discrete as time versions of an inertial second-order dynamical system involving different types of dampings (viscous damping,  Hessian-driven geometric damping).

The dynamical system perspective offers not only a powerful way to understand the geometry underlying the dynamic, but also offers a versatile framework to obtain fast, scalable and new algorithms enjoying nice convergence guarantees (including fast rates). In addition, this framework encompasses known algorithms and dynamics such as the Nesterov-type accelerated gradient methods, and the introduction of time scale factors makes it possible to further accelerate these algorithms. The framework is versatile enough to handle non-smooth and non-convex objectives that are ubiquituous in various applications.

Mon, 31 Oct 2022
14:00
L4

Stochastic methods for derivative free optimization

Stephen Becker
(University of Colorado Boulder)
Abstract

Numerical optimization is an indispensable tool of modern data analysis, and there are many optimization problems where it is difficult or impossible to compute the full gradient of the objective function. The field of derivative free optimization (DFO) addresses these cases by using only function evaluations, and has wide-ranging applications from hyper-parameter tuning in machine learning to PDE-constrained optimization.

We present two projects that attempt to scale DFO techniques to higher dimensions.  The first method converges slowly but works in very high dimensions, while the second method converges quickly but doesn't scale quite as well with dimension.  The first-method is a family of algorithms called "stochastic subspace descent" that uses a few directional derivatives at every step (i.e. projections of the gradient onto a random subspace). In special cases it is related to Spall's SPSA, Gaussian smoothing of Nesterov, and block-coordinate descent. We provide convergence analysis and discuss Johnson-Lindenstrauss style concentration.  The second method uses conventional interpolation-based trust region methods which require large ill-conditioned linear algebra operations.  We use randomized linear algebra techniques to ameliorate the issues and scale to larger dimensions; we also use a matrix-free approach that reduces memory issues.  These projects are in collaboration with David Kozak, Luis Tenorio, Alireza Doostan, Kevin Doherty and Katya Scheinberg.

Mon, 10 Oct 2022
14:00
L4

Partitioned and multirate training of neural networks

Ben Leimkuhler
(Edinburgh University)
Abstract

I will discuss the use of partitioned schemes for neural networks. This work is in the tradition of multrate numerical ODE methods in which different components of system are evolved using different numerical methods or with different timesteps. The setting is the training tasks in deep learning in which parameters of a hierarchical model must be found to describe a given data set. By choosing appropriate partitionings of the parameters some redundant computation can be avoided and we can obtain substantial computational speed-up. I will demonstrate the use of the procedure in transfer learning applications from image analysis and natural language processing, showing a reduction of around 50% in training time, without impairing the generalization performance of the resulting models. This talk describes joint work with Tiffany Vlaar.

Mon, 03 Oct 2022

14:00 - 15:00
L1

Theory and Practice of Infinitely Wide Neural Networks

Roman Novak
(Google)
Abstract

A common observation that wider (in the number of hidden units/channels/attention heads) neural networks perform better motivates studying them in the infinite-width limit.

Remarkably, infinitely wide networks can be easily described in closed form as Gaussian processes (GPs), at initialization, during, and after training—be it gradient-based, or fully Bayesian training. This provides closed-form test set predictions and uncertainties from an infinitely wide network without ever instantiating it (!).

These infinitely wide networks have become powerful models in their own right, establishing several SOTA results, and are used in applications including hyper-parameter selection, neural architecture search, meta learning, active learning, and dataset distillation.

The talk will provide a high-level overview of our work at Google Brain on infinite-width networks. In the first part I will derive core results, providing intuition for why infinite-width networks are GPs. In the second part I will discuss challenges and solutions to implementing and scaling up these GPs. In the third part, I will conclude with example applications made possible with infinite width networks.

The talk does not assume familiarity with the topic beyond general ML background.

Cut-off phenomenon for the ax+b Markov chain over a finite field
Breuillard, E Varjú, P Probability Theory and Related Fields volume 184 issue 1 85-113 (02 Sep 2022)
Global-in-time solutions and qualitative properties for the NNLIF neuron model with synaptic delay
Caceres, M Roux, P Salort, D Schneider, R Communications in Partial Differential Equations volume 44 issue 12 1358-1386 (15 Jul 2019)
Birds of a feather: Measuring grant topic diversity through effective size
Yim, A SSRN Electronic Journal
Tue, 15 Nov 2022
14:00
L6

Higher Dimensional Lubin-Tate Formal Group Laws

James Taylor
(University of Oxford)
Abstract

In this talk we will present some work in progress generalising Lubin-Tate formal group laws to higher dimensions. There have been some other generalisations, but ours is different in that the ring over which the formal group law is defined changes as the dimension increases. We will state some conjectures about these formal group laws, including their relationship to the Drinfeld tower over the p-adic upper half plane, and provide supporting evidence for these conjectures.

Tue, 11 Oct 2022
14:00
L6

A decomposition of the category of l-modular representations of SL_n(F).

Peiyi Cui
(University of East Anglia)
Abstract

Let F be a p-adic field, and k an algebraically closed field of characteristic l different from p. In this talk, we will first give a category decomposition of Rep_k(SL_n(F)), the category of smooth k-representations of SL_n(F), with respect to the GL_n(F)-equivalent supercuspidal classes of SL_n(F), which is not always a block decomposition in general. We then give a block decomposition of the supercuspidal subcategory, by introducing a partition on each GL_n(F)-equivalent supercuspidal class through type theory, and we interpret this partition by the sense of l-blocks of finite groups. We give an example where a block of Rep_k(SL_2(F)) is defined with several SL_2(F)-equivalent supercuspidal classes, which is different from the case where l is zero. We end this talk by giving a prediction on the block decomposition of Rep_k(A) for a general p-adic group A.

Subscribe to