Forthcoming events in this series

Mon, 10 Jun 2024

14:00 - 15:00
Lecture Room 3

Randomly pivoted Cholesky

Prof. Joel Tropp
(California Institute of Technology, USA)
André-Louis Cholesky entered École Polytechnique as a student in 1895. Before 1910, during his work as a surveyer for the French army, Cholesky invented a technique for solving positive-definite systems of linear equations. Cholesky's method can also be used to approximate a positive-semidefinite (psd) matrix using a small number of columns, called "pivots". A longstanding question is how to choose the pivot columns to achieve the best possible approximation.

This talk describes a simple but powerful randomized procedure for adaptively picking the pivot columns. This algorithm, randomly pivoted Cholesky (RPC), provably achieves near-optimal approximation guarantees. Moreover, in experiments, RPC matches or improves on the performance of alternative algorithms for low-rank psd approximation.

Cholesky died in 1918 from wounds suffered in battle. In 1924, Cholesky's colleague, Commandant Benoit, published his manuscript. One century later, a modern adaptation of Cholesky's method still yields state-of-the-art performance for problems in scientific machine learning.
Joint work with Yifan Chen, Ethan Epperly, and Rob Webber. Available at arXiv:2207.06503.


Mon, 03 Jun 2024

14:00 - 15:00
Lecture Room 3

Where Can Advanced Optimization Methods Help in Deep Learning?

James Martens
(Google Deep Mind)

Modern neural network models are trained using fairly standard stochastic gradient optimizers, sometimes employing mild preconditioners. 
A natural question to ask is whether significant improvements in training speed can be obtained through the development of better optimizers. 

In this talk I will argue that this is impossible in the large majority of cases, which explains why this area of research has stagnated. I will go on to identify several situations where improved preconditioners can still deliver significant speedups, including exotic architectures and loss functions, and large batch training. 

Mon, 27 May 2024

14:00 - 15:00
Lecture Room 3

Dynamic Sparsity: Routing Information through Neural Pathways

Edoardo Ponti
(University of Edinburgh)
Recent advancements in machine learning have caused a shift from traditional sparse modelling, which focuses on static feature selection in neural representations, to a paradigm based on selecting input or task-dependent pathways within neural networks. 
In fact, the ability to selectively (de)activate portions of neural computation graphs provides several advantages, including conditional computation, efficient parameter scaling, and compositional generalisation. 
In this talk, I will explore how sparse subnetworks can be identified dynamically and how parametric routing functions allow for recombining and locally adapting them in Large Language Models.


Mon, 20 May 2024

14:00 - 15:00
Lecture Room 3

Low rank approximation for faster optimization

Madeleine Udell
(Stanford University, USA)

Low rank structure is pervasive in real-world datasets.

This talk shows how to accelerate the solution of fundamental computational problems, including eigenvalue decomposition, linear system solves, composite convex optimization, and stochastic optimization (including deep learning), by exploiting this low rank structure.

We present a simple method based on randomized numerical linear algebra for efficiently computing approximate top eigende compositions, which can be used to replace large matrices (such as Hessians and constraint matrices) with low rank surrogates that are faster to apply and invert.

The resulting solvers for linear systems (NystromPCG), composite convex optimization (NysADMM), and stochastic optimization (SketchySGD and PROMISE) demonstrate strong theoretical and numerical support, outperforming state-of-the-art methods in terms of speed and robustness to hyperparameters.

Mon, 13 May 2024

14:00 - 15:00
Lecture Room 3

Compression of Graphical Data

Mihai Badiu
(Department of Engineering Science University of Oxford)

Data that have an intrinsic network structure can be found in various contexts, including social networks, biological systems (e.g., protein-protein interactions, neuronal networks), information networks (computer networks, wireless sensor networks),  economic networks, etc. As the amount of graphical data that is generated is increasingly large, compressing such data for storage, transmission, or efficient processing has become a topic of interest. In this talk, I will give an information theoretic perspective on graph compression. 

The focus will be on compression limits and their scaling with the size of the graph. For lossless compression, the Shannon entropy gives the fundamental lower limit on the expected length of any compressed representation. I will discuss the entropy of some common random graph models, with a particular emphasis on our results on the random geometric graph model. 

Then, I will talk about the problem of compressing a graph with side information, i.e., when an additional correlated graph is available at the decoder. Turning to lossy compression, where one accepts a certain amount of distortion between the original and reconstructed graphs, I will present theoretical limits to lossy compression that we obtained for the Erdős–Rényi and stochastic block models by using rate-distortion theory.

Mon, 06 May 2024

14:00 - 15:00
Lecture Room 3

Bayesian Interpolation with Linear and Shaped Neural Networks

Boris Hanin
(Princeton University)

This talk, based on joint work with Alexander Zlokapa, concerns Bayesian inference with neural networks. 

I will begin by presenting a result giving exact non-asymptotic formulas for Bayesian posteriors in deep linear networks. A key takeaway is the appearance of a novel scaling parameter, given by # data * depth / width, which controls the effective depth of the posterior in the limit of large model and dataset size. 

Additionally, I will explain some quite recent results on the role of this effective depth parameter in Bayesian inference with deep non-linear neural networks that have shaped activations.

Mon, 29 Apr 2024

11:00 - 12:00
Lecture Room 3

Deep Gaussian processes: theory and applications

Aretha Teckentrup
(University of Edinburgh)
Further Information

Please note that this seminar starts at 11am and finishes at 12pm. 


Deep Gaussian processes have proved remarkably successful as a tool for various statistical inference tasks. This success relates in part to the flexibility of these processes and their ability to capture complex, non-stationary behaviours. 

In this talk, we will introduce the general framework of deep Gaussian processes, in which many examples can be constructed, and demonstrate their superiority in inverse problems including computational imaging and regression.

 We will discuss recent algorithmic developments for efficient sampling, as well as recent theoretical results which give crucial insight into the behaviour of the methodology.


Mon, 22 Apr 2024

14:00 - 15:00
Lecture Room 3

Quantization of Bandlimited Graph Signals

Hanna Veselovska
(Technical University of Munich)

Graph signals provide a natural representation of data in many applications, such as social networks, web information analysis, sensor networks, and machine learning. Graph signal & data processing is currently an active field of mathematical research that aims to extend the well-developed tools for analyzing conventional signals to signals on graphs while exploiting the underlying connectivity. A key challenge in this context is the problem of quantization, that is, finding efficient ways of representing the values of graph signals with only a finite number of bits.

In this talk, we address the problem of quantizing bandlimited graph signals. We introduce two classes of noise-shaping algorithms for graph signals that differ in their sampling methodologies. We demonstrate that these algorithms can efficiently construct quantized representatives of bandlimited graph-based signals with bounded amplitude.

Inspired by the results of Zhang et al. in 2022, we provide theoretical guarantees on the relative error between the true signal and its quantized representative for one of the algorithms.
As will be discussed, the incoherence of the underlying graph plays an important role in the quantization process. Namely, bandlimited signals supported on graphs of lower incoherence allow for smaller relative errors. We support our findings with various numerical experiments showcasing the performance of the proposed quantization algorithms for bandlimited signals defined on graphs with different degrees of incoherence.

This is joint work with Felix Krahmer (Technical University of Munich), He Lyu (Meta), Rayan Saab (University of California San Diego), and Rongrong Wang (Michigan State University).

Mon, 08 Apr 2024

11:00 - 12:00
Lecture Room 3

Heavy-Tailed Large Deviations and Sharp Characterization of Global Dynamics of SGDs in Deep Learning

Chang-Han Rhee
(Northwestern University, USA)

While the typical behaviors of stochastic systems are often deceptively oblivious to the tail distributions of the underlying uncertainties, the ways rare events arise are vastly different depending on whether the underlying tail distributions are light-tailed or heavy-tailed. Roughly speaking, in light-tailed settings, a system-wide rare event arises because everything goes wrong a little bit as if the entire system has conspired up to provoke the rare event (conspiracy principle), whereas, in heavy-tailed settings, a system-wide rare event arises because a small number of components fail catastrophically (catastrophe principle). In the first part of this talk, I will introduce the recent developments in the theory of large deviations for heavy-tailed stochastic processes at the sample path level and rigorously characterize the catastrophe principle for such processes. 
The empirical success of deep learning is often attributed to the mysterious ability of stochastic gradient descents (SGDs) to avoid sharp local minima in the loss landscape, as sharp minima are believed to lead to poor generalization. To unravel this mystery and potentially further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics within complex non-convex loss landscapes. In the second part of this talk, I will characterize the global dynamics of SGDs building on the heavy-tailed large deviations and local stability framework developed in the first part. This leads to the heavy-tailed counterparts of the classical Freidlin-Wentzell and Eyring-Kramers theories. Moreover, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and hence achieve better generalization performance for the test data.  

This talk is based on the joint work with Mihail Bazhba, Jose Blanchet, Bohan Chen, Sewoong Oh, Zhe Su, Xingyu Wang, and Bert Zwart.




Chang-Han Rhee is an Assistant Professor in Industrial Engineering and Management Sciences at Northwestern University. Before joining Northwestern University, he was a postdoctoral researcher at Centrum Wiskunde & Informatica and Georgia Tech. He received his Ph.D. from Stanford University. His research interests include applied probability, stochastic simulation, experimental design, and the theoretical foundation of machine learning. His research has been recognized with the 2016 INFORMS Simulation Society Outstanding Publication Award, the 2012 Winter Simulation Conference Best Student Paper Award, the 2023 INFORMS George Nicholson Student Paper Competition (2nd place), and the 2013 INFORMS George Nicholson Student Paper Competition (finalist). Since 2022, his research has been supported by the NSF CAREER Award.  

Mon, 04 Mar 2024

14:00 - 15:00
Lecture Room 3

On transport methods for simulation-based inference and data assimilation

Prof Youssef Marzouk

Many practical Bayesian inference problems fall into the simulation-based or "likelihood-free" setting, where evaluations of the likelihood function or prior density are unavailable or intractable; instead one can only draw samples from the joint parameter-data prior. Learning conditional distributions is essential to the solution of these problems. 
To this end, I will discuss a powerful class of methods for conditional density estimation and conditional simulation based on transportation of measure. An important application for these methods lies in data assimilation for dynamical systems, where transport enables new approaches to nonlinear filtering and smoothing. 
To illuminate some of the theoretical underpinnings of these methods, I will discuss recent work on monotone map representations, optimization guarantees for learning maps from data, and the statistical convergence of transport-based density estimators.