Forthcoming events in this series


Mon, 28 Apr 2025

14:00 - 15:00
Lecture Room 3

Deep Learning for Inverse Problems: Theoretical Perspectives, Algorithms, and Applications

Professor Miguel Rodrigues, PhD, FIEEE
(University College London)
Abstract

Recent years have witnessed a surge of interest in deep learning methods to tackle inverse problems arising in various domains such as medical imaging, remote sensing, and the arts and humanities. This talk offers an overview of recent advances in the foundations and applications of deep learning for inverse problems, with a focus on model-based deep learning methods. Concretely, this talk will overview our work relating to theoretical advances in the area of mode-based learning, including learning guarantees; algorithmic advances in model-based learning; and, finally it will showcase a portfolio of emerging signal & image processing challenges that benefit from model based learning, including image separation / deconvolution challenges arising in the arts and humanities.

 

 

Bio:

Miguel Rodrigues is a Professor of Information Theory and Processing at University College London; he leads the Information, Inference and Machine Learning Lab at UCL, and he has also been the founder and director of the master programme in Integrated Machine Learning Systems at UCL. He has also been the UCL Turing University Lead and a Turing Fellow with the Alan Turing Institute — the UK National Institute of Data Science and Artificial Intelligence.

He held various appointments with various institutions worldwide including Cambridge University, Princeton University, Duke University, and the University of Porto, Portugal. He obtained the undergraduate degree in Electrical and Computer Engineering from the Faculty of Engineering of the University of Porto, Portugal and the PhD degree in Electronic and Electrical Engineering from University College London.

Dr. Rodrigues's research lies in the general areas of information theory, information processing, and machine learning. His most relevant contributions have ranged from the information-theoretic analysis and design of communications systems, information-theoretic security, information-theoretic analysis and design of sensing systems, and the information-theoretic foundations of machine learning.

He serves or has served as Editor of IEEE BITS, Editor of the IEEE Transactions on Information Theory, and Lead Guest Editor of various Special Issues of the IEEE Journal on Selected Topics in Signal Processing, Information and Inference, and Foundations and Trends in Signal Processing.

Dr. Rodrigues has been the recipient of various prizes and awards including the Prize for Merit from the University of Porto, the Prize Engenheiro Cristian Spratley, the Prize Engenheiro Antonio de Almeida, fellowships from the Portuguese Foundation for Science and Technology, and fellowships from the Foundation Calouste Gulbenkian. Dr. Rodrigues research on information-theoretic security has also attracted the IEEE Communications and Information Theory Societies Joint Paper Award 2011.  

He has also been elevated to Fellow of the Institute of Electronics and Electrical Engineers (IEEE) for his contributions to the ‘multi-modal data processing and reliable and secure communications.’

Mon, 24 Feb 2025

14:00 - 15:00
Lecture Room 3

Single location regression and attention-based models

Claire Boyer
(Sorbonne University)
Abstract

Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linear projection of the input. To solve this task, we propose a dedicated predictor, which turns out to be a simplified version of a non-linear self-attention layer. We study its theoretical properties, by showing its asymptotic Bayes optimality and analyzing its training dynamics. In particular, despite the non-convex nature of the problem, the predictor effectively learns the underlying structure. This work highlights the capacity of attention mechanisms to handle sparse token information and internal linear structures.

This is a joint work with Pierre Marion, Gérard Biau and Raphaël Berthier

Mon, 10 Feb 2025

14:00 - 15:00
Lecture Room 3

Of dice and games: A theory of generalized boosting

Nicolò Cesa-Bianchi
(University of Milano)
Abstract

Cost-sensitive loss functions are crucial in many real-world prediction problems, where different types of errors are penalized differently; for example, in medical diagnosis, a false negative prediction can lead to worse consequences than a false positive prediction. However, traditional learning theory has mostly focused on the symmetric zero-one loss, letting cost-sensitive losses largely unaddressed. In this work, we extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses. Cost-sensitive losses assign costs to the entries of a confusion matrix, and are used to control the sum of prediction errors accounting for the cost of each error type. Multi-objective losses, on the other hand, simultaneously track multiple cost-sensitive losses, and are useful when the goal is to satisfy several criteria at once (e.g., minimizing false positives while keeping false negatives below a critical threshold). We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees that distinguishes which guarantees are trivial (i.e., can always be achieved), which ones are boostable (i.e., imply strong learning), and which ones are intermediate, implying non-trivial yet not arbitrarily accurate learning. For binary classification, we establish a dichotomy: a weak learning guarantee is either trivial or boostable. In the multiclass setting, we describe a more intricate landscape of intermediate weak learning guarantees. Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.

Mon, 03 Feb 2025

14:00 - 15:00
Lecture Room 3

Model-Based Deep Learning for Inverse Problems in Imaging

Pier Dragotti
(Imperial College)
Abstract

Inverse problems involve reconstructing unknown physical quantities from indirect measurements. They appear in various fields, including medical imaging (e.g., MRI, Ultrasound, CT), material sciences and molecular biology (e.g., electron microscopy), as well as remote sensing just to name a few examples. While deep neural networks are currently able to achieve state-of-the-art performance in many imaging tasks, in this talk we argue that  many inverse imaging problems cannot be solved convincingly using a black-box solution. Instead, they require a well-crafted combination of computational tools taking the underlying signal, the physical constraints and acquisition characteristics into account.


In the first part of the talk, we introduce INDigo+, a novel INN-guided probabilistic diffusion algorithm for arbitrary image restoration tasks. INDigo+ combines the perfect reconstruction property of invertible neural networks (INNs) with the strong generative capabilities of pre-trained diffusion models. Specifically, we leverage the invertibility of the network to condition the diffusion process and in this way we generate high quality restored images consistent with the measurements.

In the second part of the talk, we discuss the unfolding techniques which is an approach that allows embedding priors and models in the neural network architecture. In this context we discuss the problem of monitoring the dynamics of large populations of neurons over a large area of the brain. Light-field microscopy (LFM), a type of scanless microscopy, is a particularly attractive candidate for high-speed three-dimensional (3D) imaging which is needed for monitoring neural activity. We review fundamental aspects of LFM and then present computational methods based on deep learning for neuron localization and activity estimation from light-field data.
Finally, we look at the multi-modal case and present an application in art investigation. Often X-ray images of Old Master paintings contain information of the visible painting and of concealed sub-surface design, we therefore introduce a model-based neural network capable of separating from the “mixed X-ray”  the X-ray image of the visible painting and the X-ray of the concealed design.

This is joint work with  A. Foust, P. Song, C. Howe, H. Verinaz, J. Huang, Di You and Y. Su from Imperial College London, M. Rodrigues and W. Pu from University College London, I. Daubechies from Duke University, Barak Sober from the Hebrew University of Jerusalem and C. Higgitt and N. Daly from The National Gallery in London.

Mon, 02 Dec 2024

14:00 - 15:00
Lecture Room 3

Enhancing Accuracy in Deep Learning using Marchenko-Pastur Distribution

Leonid Beryland
(Penn State University)
Abstract

We begin with a short overview of Random Matrix Theory (RMT), focusing on the Marchenko-Pastur (MP) spectral approach. 

Next, we present recent analytical and numerical results on accelerating the training of Deep Neural Networks (DNNs) via MP-based pruning ([1]). Furthermore, we show that combining this pruning with L2 regularization allows one to drastically decrease randomness in the weight layers and, hence, simplify the loss landscape. Moreover, we show that the DNN’s weights become deterministic at any local minima of the loss function. 
 

Finally, we discuss our most recent results (in progress) on the generalization of the MP law to the input-output Jacobian matrix of the DNN. Here, our focus is on the existence of fixed points. The numerical examples are done for several types of DNNs: fully connected, CNNs and ViTs. These works are done jointly with PSU PhD students M. Kiyashko, Y. Shmalo, L. Zelong and with E. Afanasiev and V. Slavin (Kharkiv, Ukraine). 

 

[1] Berlyand, Leonid, et al. "Enhancing accuracy in deep learning using random matrix theory." Journal of Machine Learning. (2024).

Mon, 25 Nov 2024

14:00 - 15:00
Lecture Room 3

Ease-controlled Incremental Gradient- type Algorithm for nonconvex finite sum optimization

Laura Palagi
(Sapienza University of Rome)
Abstract

We consider minimizing the sum of a large number of smooth and possibly non-convex functions, which is the typical problem encountered in the training of deep neural networks on large-size datasets. 

Improving the Controlled Minibatch Algorithm (CMA) scheme proposed by Liuzzi et al. (2022), we propose CMALight, an ease-controlled incremental gradient (IG)-like method. The control of the IG iteration is performed by means of a costless watchdog rule and a derivative-free line search that activates only sporadically to guarantee convergence. The schemes also allow controlling the updating of the learning rate used in the main IG iteration, avoiding the use of preset rules, thus overcoming another tricky aspect in implementing online methods.

Convergence to a stationary point holds under the lonely assumption of Lipschitz continuity of the gradients of the component functions without knowing the Lipschitz constant or imposing any growth assumptions on the norm of the gradients.

We present two sets of computational tests. First, we compare CMALight against state-of-the-art mini-batch algorithms for training standard deep networks on large-size datasets, and deep convolutional neural networks and residual networks on standard image classification tasks on CIFAR10 and CIFAR100. 

Results shows that CMALight easily scales up to problem with order of millions  variables and has an advantage over its state-of-the-art competitors.

Finally, we present computational results on generative tasks, testing CMALight scaling capabilities on image generation with diffusion models (U-Net architecture). CMA Light achieves better test performances and is more efficient than standard SGD with weight decay, thus reducing the computational burden (and the carbon footprint of the training process).

Laura Palagi, @email

Department of Computer, Control and Management Engineering,

Sapienza University of Rome, Italy

 

Joint work with 

Corrado Coppola, @email

Giampaolo Liuzzi, @email

Lorenzo Ciarpaglini, @email

 

 

Mon, 18 Nov 2024

14:00 - 15:00
Lecture Room 3

Model-based (unfolding) neural networks and where to find them: from practice to theory

Vicky Kouni
Abstract

In recent years, a new class of deep neural networks has emerged, which finds its roots at model-based iterative algorithms solving inverse problems. We call these model-based neural networks deep unfolding networks (DUNs). The term is coined due to their formulation: the iterations of optimization algorithms are “unfolded” as layers of neural networks, which solve the inverse problem at hand. Ever since their advent, DUNs have been employed for tackling assorted problems, e.g., compressed sensing (CS), denoising, super-resolution, pansharpening. 

In this talk, we will revisit the application of DUNs on the CS problem, which pertains to reconstructing data from incomplete observations. We will present recent trends regarding the broader family of DUNs for CS and dive into their theory, which mainly revolves around their generalization performance; the latter is important, because it informs us about the behaviour of a neural network on examples it has never been trained on before. 
Particularly, we will focus our interest on overparameterized DUNs, which exhibit remarkable performance in terms of reconstruction and generalization error. As supported by our theoretical and empirical findings, the generalization performance of overparameterized DUNs depends on their structural properties. Our analysis sets a solid mathematical ground for developing more stable, robust, and efficient DUNs, boosting their real-world performance.

Mon, 11 Nov 2024

14:00 - 15:00
Lecture Room 3

Understanding the learning dynamics of self-predictive representation learning

Yunhao Tang
(Google Deep Mind)
Abstract

Self-predictive learning (aka non-contrastive learning) has become an increasingly important paradigm for representation learning. Self-predictive learning is simple yet effective: it learns without contrastive examples yet extracts useful representations through a self-predicitve objective. A common myth with self-predictive learning is that the optimization objective itself yields trivial representations as globally optimal solutions, yet practical implementations can produce meaningful solutions. 

 

We reconcile the theory-practice gap by studying the learning dynamics of self-predictive learning. Our analysis is based on analyzing a non-linear ODE system that sheds light on why despite a seemingly problematic optimization objective, self-predictive learning does not collapse, which echoes with important implementation "tricks" in practice. Our results also show that in a linear setup, self-predictive learning can be understood as gradient based PCA or SVD on the data matrix, hinting at meaningful representations to be captured through the learning process.

 

This talk is based on our ICML 2023 paper "Understanding self-predictive learning for reinforcement learning".

Mon, 04 Nov 2024

14:00 - 15:00
Lecture Room 3

Efficient high-resolution refinement in cryo-EM with stochastic gradient descent

Bogdan Toader
(MRC Laboratory of Molecular Biology Cambridge Biomedical Campus)
Abstract

Electron cryomicroscopy (cryo-EM) is an imaging technique widely used in structural biology to determine the three-dimensional structure of biological molecules from noisy two-dimensional projections with unknown orientations. As the typical pipeline involves processing large amounts of data, efficient algorithms are crucial for fast and reliable results. The stochastic gradient descent (SGD) algorithm has been used to improve the speed of ab initio reconstruction, which results in a first, low-resolution estimation of the volume representing the molecule of interest, but has yet to be applied successfully in the high-resolution regime, where expectation-maximization algorithms achieve state-of-the-art results, at a high computational cost. 
In this work, we investigate the conditioning of the optimisation problem and show that the large condition number prevents the successful application of gradient descent-based methods at high resolution. 
Our results include a theoretical analysis of the condition number of the optimisation problem in a simplified setting where the individual projection directions are known, an algorithm based on computing a diagonal preconditioner using Hutchinson's diagonal estimator, and numerical experiments showing the improvement in the convergence speed when using the estimated preconditioner with SGD. The preconditioned SGD approach can potentially enable a simple and unified approach to ab initio reconstruction and high-resolution refinement with faster convergence speed and higher flexibility, and our results are a promising step in this direction.

Mon, 14 Oct 2024

14:00 - 15:00
Lecture Room 3

Complexity of Finding Local Minima in Continuous Optimization

Amir Ali Ahmadi
(Princeton University, NJ)
Abstract

 

Can we efficiently find a local minimum of a nonconvex continuous optimization problem? 

We give a rather complete answer to this question for optimization problems defined by polynomial data. In the unconstrained case, the answer remains positive for polynomials of degree up to three: We show that while the seemingly easier task of finding a critical point of a cubic polynomial is NP-hard, the complexity of finding a local minimum of a cubic polynomial is equivalent to the complexity of semidefinite programming. In the constrained case, we prove that unless P=NP, there cannot be a polynomial-​time algorithm that finds a point within Euclidean distance $c^n$ (for any constant $c\geq 0$) of a local minimum of an $n$-​variate quadratic polynomial over a polytope. 
This result (with $c=0$) answers a question of Pardalos and Vavasis that appeared on a list of seven open problems in complexity theory for numerical optimization in 1992.

Based on joint work with Jeffrey Zhang (Yale).

 

 

Biography

Amir Ali Ahmadi is a Professor at the Department of Operations Research and Financial Engineering at Princeton University and an Associated Faculty member of the Program in Applied and Computational Mathematics, the Department of Computer Science, the Department of Mechanical and Aerospace Engineering, the Department of Electrical Engineering, and the Center for Statistics and Machine Learning. He serves as the Director of the Certificate Program in Optimization and Quantitative Decision Science. He has also held visiting appointments with the industry, as a Visiting Senior Optimization Fellow at Citadel, Global Quantitative Strategies, and a Visiting Research Scientist at Google Brain (in the Robotics group). Amir Ali received his PhD in EECS from MIT and was a Goldstine Fellow at the IBM Watson Research Center prior to joining Princeton. His research interests are in optimization theory, computational aspects of dynamical systems, control-oriented learning, and algorithms and complexity.

Amir Ali's distinctions include the Sloan Fellowship in Computer Science, the Presidential Early Career Award for Scientists and Engineers (PECASE), the NSF CAREER Award, the AFOSR Young Investigator Award, the DARPA Faculty Award, the Google Faculty Award, the MURI award of the AFOSR, the Howard B. Wentz Junior Faculty Award, as well as the Innovation Award of Princeton University, the Goldstine Fellowship of IBM Research, and the Oberwolfach Fellowship of the NSF. His undergraduate course at Princeton (ORF 363, ``Computing and Optimization'') is a three-time recipient of the Teaching Award of the Princeton Engineering Council, as well as a recipient of the Excellence in Teaching of Operations Research Award of the Institute for Industrial and Systems Engineers, the Princeton SEAS Distinguished Teaching Award, and the Phi Beta Kappa Award for Excellence in Undergraduate Teaching at Princeton. Amir Ali's research has been recognized by a number of best-paper awards, including the INFORMS Optimization Society's Young Researchers Prize, the INFORMS Computing Society Prize (for best series of papers at the interface of operations research and computer science), the best conference paper award of the IEEE International Conference on Robotics and Automation, and the best paper prize of the SIAM Journal on Control and Optimization. Amir Ali was a plenary speaker at the 2021 SIAM Conference on Optimization and the 2022 Colombian Conference on Applied and Industrial Mathematics.

 

 

 

 

Mon, 10 Jun 2024

14:00 - 15:00
Lecture Room 3

Randomly pivoted Cholesky

Prof. Joel Tropp
(California Institute of Technology, USA)
Abstract
André-Louis Cholesky entered École Polytechnique as a student in 1895. Before 1910, during his work as a surveyer for the French army, Cholesky invented a technique for solving positive-definite systems of linear equations. Cholesky's method can also be used to approximate a positive-semidefinite (psd) matrix using a small number of columns, called "pivots". A longstanding question is how to choose the pivot columns to achieve the best possible approximation.

This talk describes a simple but powerful randomized procedure for adaptively picking the pivot columns. This algorithm, randomly pivoted Cholesky (RPC), provably achieves near-optimal approximation guarantees. Moreover, in experiments, RPC matches or improves on the performance of alternative algorithms for low-rank psd approximation.

Cholesky died in 1918 from wounds suffered in battle. In 1924, Cholesky's colleague, Commandant Benoit, published his manuscript. One century later, a modern adaptation of Cholesky's method still yields state-of-the-art performance for problems in scientific machine learning.
 
Joint work with Yifan Chen, Ethan Epperly, and Rob Webber. Available at arXiv:2207.06503.


 

Mon, 03 Jun 2024

14:00 - 15:00
Lecture Room 3

Where Can Advanced Optimization Methods Help in Deep Learning?

James Martens
(Google Deep Mind)
Abstract

Modern neural network models are trained using fairly standard stochastic gradient optimizers, sometimes employing mild preconditioners. 
A natural question to ask is whether significant improvements in training speed can be obtained through the development of better optimizers. 

In this talk I will argue that this is impossible in the large majority of cases, which explains why this area of research has stagnated. I will go on to identify several situations where improved preconditioners can still deliver significant speedups, including exotic architectures and loss functions, and large batch training. 

Mon, 27 May 2024

14:00 - 15:00
Lecture Room 3

Dynamic Sparsity: Routing Information through Neural Pathways

Edoardo Ponti
(University of Edinburgh)
Abstract
Recent advancements in machine learning have caused a shift from traditional sparse modelling, which focuses on static feature selection in neural representations, to a paradigm based on selecting input or task-dependent pathways within neural networks. 
In fact, the ability to selectively (de)activate portions of neural computation graphs provides several advantages, including conditional computation, efficient parameter scaling, and compositional generalisation. 
 
In this talk, I will explore how sparse subnetworks can be identified dynamically and how parametric routing functions allow for recombining and locally adapting them in Large Language Models.


 

Mon, 20 May 2024

14:00 - 15:00
Lecture Room 3

Low rank approximation for faster optimization

Madeleine Udell
(Stanford University, USA)
Abstract

Low rank structure is pervasive in real-world datasets.

This talk shows how to accelerate the solution of fundamental computational problems, including eigenvalue decomposition, linear system solves, composite convex optimization, and stochastic optimization (including deep learning), by exploiting this low rank structure.

We present a simple method based on randomized numerical linear algebra for efficiently computing approximate top eigende compositions, which can be used to replace large matrices (such as Hessians and constraint matrices) with low rank surrogates that are faster to apply and invert.

The resulting solvers for linear systems (NystromPCG), composite convex optimization (NysADMM), and stochastic optimization (SketchySGD and PROMISE) demonstrate strong theoretical and numerical support, outperforming state-of-the-art methods in terms of speed and robustness to hyperparameters.

Mon, 13 May 2024

14:00 - 15:00
Lecture Room 3

Compression of Graphical Data

Mihai Badiu
(Department of Engineering Science University of Oxford)
Abstract

Data that have an intrinsic network structure can be found in various contexts, including social networks, biological systems (e.g., protein-protein interactions, neuronal networks), information networks (computer networks, wireless sensor networks),  economic networks, etc. As the amount of graphical data that is generated is increasingly large, compressing such data for storage, transmission, or efficient processing has become a topic of interest. In this talk, I will give an information theoretic perspective on graph compression. 

The focus will be on compression limits and their scaling with the size of the graph. For lossless compression, the Shannon entropy gives the fundamental lower limit on the expected length of any compressed representation. I will discuss the entropy of some common random graph models, with a particular emphasis on our results on the random geometric graph model. 

Then, I will talk about the problem of compressing a graph with side information, i.e., when an additional correlated graph is available at the decoder. Turning to lossy compression, where one accepts a certain amount of distortion between the original and reconstructed graphs, I will present theoretical limits to lossy compression that we obtained for the Erdős–Rényi and stochastic block models by using rate-distortion theory.

Mon, 06 May 2024

14:00 - 15:00
Lecture Room 3

Bayesian Interpolation with Linear and Shaped Neural Networks

Boris Hanin
(Princeton University)
Abstract

This talk, based on joint work with Alexander Zlokapa, concerns Bayesian inference with neural networks. 

I will begin by presenting a result giving exact non-asymptotic formulas for Bayesian posteriors in deep linear networks. A key takeaway is the appearance of a novel scaling parameter, given by # data * depth / width, which controls the effective depth of the posterior in the limit of large model and dataset size. 

Additionally, I will explain some quite recent results on the role of this effective depth parameter in Bayesian inference with deep non-linear neural networks that have shaped activations.

Mon, 29 Apr 2024

11:00 - 12:00
Lecture Room 3

Deep Gaussian processes: theory and applications

Aretha Teckentrup
(University of Edinburgh)
Further Information

Please note that this seminar starts at 11am and finishes at 12pm. 

Abstract

Deep Gaussian processes have proved remarkably successful as a tool for various statistical inference tasks. This success relates in part to the flexibility of these processes and their ability to capture complex, non-stationary behaviours. 

In this talk, we will introduce the general framework of deep Gaussian processes, in which many examples can be constructed, and demonstrate their superiority in inverse problems including computational imaging and regression.

 We will discuss recent algorithmic developments for efficient sampling, as well as recent theoretical results which give crucial insight into the behaviour of the methodology.

 

Mon, 22 Apr 2024

14:00 - 15:00
Lecture Room 3

Quantization of Bandlimited Graph Signals

Hanna Veselovska
(Technical University of Munich)
Abstract

Graph signals provide a natural representation of data in many applications, such as social networks, web information analysis, sensor networks, and machine learning. Graph signal & data processing is currently an active field of mathematical research that aims to extend the well-developed tools for analyzing conventional signals to signals on graphs while exploiting the underlying connectivity. A key challenge in this context is the problem of quantization, that is, finding efficient ways of representing the values of graph signals with only a finite number of bits.

In this talk, we address the problem of quantizing bandlimited graph signals. We introduce two classes of noise-shaping algorithms for graph signals that differ in their sampling methodologies. We demonstrate that these algorithms can efficiently construct quantized representatives of bandlimited graph-based signals with bounded amplitude.

Inspired by the results of Zhang et al. in 2022, we provide theoretical guarantees on the relative error between the true signal and its quantized representative for one of the algorithms.
As will be discussed, the incoherence of the underlying graph plays an important role in the quantization process. Namely, bandlimited signals supported on graphs of lower incoherence allow for smaller relative errors. We support our findings with various numerical experiments showcasing the performance of the proposed quantization algorithms for bandlimited signals defined on graphs with different degrees of incoherence.

This is joint work with Felix Krahmer (Technical University of Munich), He Lyu (Meta), Rayan Saab (University of California San Diego), and Rongrong Wang (Michigan State University).

Mon, 08 Apr 2024

11:00 - 12:00
Lecture Room 3

Heavy-Tailed Large Deviations and Sharp Characterization of Global Dynamics of SGDs in Deep Learning

Chang-Han Rhee
(Northwestern University, USA)
Abstract

While the typical behaviors of stochastic systems are often deceptively oblivious to the tail distributions of the underlying uncertainties, the ways rare events arise are vastly different depending on whether the underlying tail distributions are light-tailed or heavy-tailed. Roughly speaking, in light-tailed settings, a system-wide rare event arises because everything goes wrong a little bit as if the entire system has conspired up to provoke the rare event (conspiracy principle), whereas, in heavy-tailed settings, a system-wide rare event arises because a small number of components fail catastrophically (catastrophe principle). In the first part of this talk, I will introduce the recent developments in the theory of large deviations for heavy-tailed stochastic processes at the sample path level and rigorously characterize the catastrophe principle for such processes. 
The empirical success of deep learning is often attributed to the mysterious ability of stochastic gradient descents (SGDs) to avoid sharp local minima in the loss landscape, as sharp minima are believed to lead to poor generalization. To unravel this mystery and potentially further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics within complex non-convex loss landscapes. In the second part of this talk, I will characterize the global dynamics of SGDs building on the heavy-tailed large deviations and local stability framework developed in the first part. This leads to the heavy-tailed counterparts of the classical Freidlin-Wentzell and Eyring-Kramers theories. Moreover, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and hence achieve better generalization performance for the test data.  

This talk is based on the joint work with Mihail Bazhba, Jose Blanchet, Bohan Chen, Sewoong Oh, Zhe Su, Xingyu Wang, and Bert Zwart.

 

 

Bio:

Chang-Han Rhee is an Assistant Professor in Industrial Engineering and Management Sciences at Northwestern University. Before joining Northwestern University, he was a postdoctoral researcher at Centrum Wiskunde & Informatica and Georgia Tech. He received his Ph.D. from Stanford University. His research interests include applied probability, stochastic simulation, experimental design, and the theoretical foundation of machine learning. His research has been recognized with the 2016 INFORMS Simulation Society Outstanding Publication Award, the 2012 Winter Simulation Conference Best Student Paper Award, the 2023 INFORMS George Nicholson Student Paper Competition (2nd place), and the 2013 INFORMS George Nicholson Student Paper Competition (finalist). Since 2022, his research has been supported by the NSF CAREER Award.  
 

Mon, 04 Mar 2024

14:00 - 15:00
Lecture Room 3

On transport methods for simulation-based inference and data assimilation

Prof Youssef Marzouk
(MIT)
Abstract

Many practical Bayesian inference problems fall into the simulation-based or "likelihood-free" setting, where evaluations of the likelihood function or prior density are unavailable or intractable; instead one can only draw samples from the joint parameter-data prior. Learning conditional distributions is essential to the solution of these problems. 
To this end, I will discuss a powerful class of methods for conditional density estimation and conditional simulation based on transportation of measure. An important application for these methods lies in data assimilation for dynamical systems, where transport enables new approaches to nonlinear filtering and smoothing. 
To illuminate some of the theoretical underpinnings of these methods, I will discuss recent work on monotone map representations, optimization guarantees for learning maps from data, and the statistical convergence of transport-based density estimators.