Mon, 08 Apr 2024

11:00 - 12:00
Lecture Room 3

Heavy-Tailed Large Deviations and Sharp Characterization of Global Dynamics of SGDs in Deep Learning

Chang-Han Rhee
(Northwestern University, USA)
Abstract

While the typical behaviors of stochastic systems are often deceptively oblivious to the tail distributions of the underlying uncertainties, the ways rare events arise are vastly different depending on whether the underlying tail distributions are light-tailed or heavy-tailed. Roughly speaking, in light-tailed settings, a system-wide rare event arises because everything goes wrong a little bit as if the entire system has conspired up to provoke the rare event (conspiracy principle), whereas, in heavy-tailed settings, a system-wide rare event arises because a small number of components fail catastrophically (catastrophe principle). In the first part of this talk, I will introduce the recent developments in the theory of large deviations for heavy-tailed stochastic processes at the sample path level and rigorously characterize the catastrophe principle for such processes. 
The empirical success of deep learning is often attributed to the mysterious ability of stochastic gradient descents (SGDs) to avoid sharp local minima in the loss landscape, as sharp minima are believed to lead to poor generalization. To unravel this mystery and potentially further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics within complex non-convex loss landscapes. In the second part of this talk, I will characterize the global dynamics of SGDs building on the heavy-tailed large deviations and local stability framework developed in the first part. This leads to the heavy-tailed counterparts of the classical Freidlin-Wentzell and Eyring-Kramers theories. Moreover, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and hence achieve better generalization performance for the test data.  

This talk is based on the joint work with Mihail Bazhba, Jose Blanchet, Bohan Chen, Sewoong Oh, Zhe Su, Xingyu Wang, and Bert Zwart.
 

Mon, 08 Apr 2024

11:00 - 12:00
Lecture Room 3

Heavy-Tailed Large Deviations and Sharp Characterization of Global Dynamics of SGDs in Deep Learning

Chang-Han Rhee
(Northwestern University, USA)
Abstract

While the typical behaviors of stochastic systems are often deceptively oblivious to the tail distributions of the underlying uncertainties, the ways rare events arise are vastly different depending on whether the underlying tail distributions are light-tailed or heavy-tailed. Roughly speaking, in light-tailed settings, a system-wide rare event arises because everything goes wrong a little bit as if the entire system has conspired up to provoke the rare event (conspiracy principle), whereas, in heavy-tailed settings, a system-wide rare event arises because a small number of components fail catastrophically (catastrophe principle). In the first part of this talk, I will introduce the recent developments in the theory of large deviations for heavy-tailed stochastic processes at the sample path level and rigorously characterize the catastrophe principle for such processes. 

The empirical success of deep learning is often attributed to the mysterious ability of stochastic gradient descents (SGDs) to avoid sharp local minima in the loss landscape, as sharp minima are believed to lead to poor generalization. To unravel this mystery and potentially further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics within complex non-convex loss landscapes. In the second part of this talk, I will characterize the global dynamics of SGDs building on the heavy-tailed large deviations and local stability framework developed in the first part. This leads to the heavy-tailed counterparts of the classical Freidlin-Wentzell and Eyring-Kramers theories. Moreover, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and hence achieve better generalization performance for the test data.  

 

This talk is based on the joint work with Mihail Bazhba, Jose Blanchet, Bohan Chen, Sewoong Oh, Zhe Su, Xingyu Wang, and Bert Zwart.

Mon, 08 Apr 2024

11:00 - 12:00
Lecture Room 3

Heavy-Tailed Large Deviations and Sharp Characterization of Global Dynamics of SGDs in Deep Learning

Chang-Han Rhee
(Northwestern University, USA)
Abstract

While the typical behaviors of stochastic systems are often deceptively oblivious to the tail distributions of the underlying uncertainties, the ways rare events arise are vastly different depending on whether the underlying tail distributions are light-tailed or heavy-tailed. Roughly speaking, in light-tailed settings, a system-wide rare event arises because everything goes wrong a little bit as if the entire system has conspired up to provoke the rare event (conspiracy principle), whereas, in heavy-tailed settings, a system-wide rare event arises because a small number of components fail catastrophically (catastrophe principle). In the first part of this talk, I will introduce the recent developments in the theory of large deviations for heavy-tailed stochastic processes at the sample path level and rigorously characterize the catastrophe principle for such processes. 
The empirical success of deep learning is often attributed to the mysterious ability of stochastic gradient descents (SGDs) to avoid sharp local minima in the loss landscape, as sharp minima are believed to lead to poor generalization. To unravel this mystery and potentially further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics within complex non-convex loss landscapes. In the second part of this talk, I will characterize the global dynamics of SGDs building on the heavy-tailed large deviations and local stability framework developed in the first part. This leads to the heavy-tailed counterparts of the classical Freidlin-Wentzell and Eyring-Kramers theories. Moreover, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and hence achieve better generalization performance for the test data.  

This talk is based on the joint work with Mihail Bazhba, Jose Blanchet, Bohan Chen, Sewoong Oh, Zhe Su, Xingyu Wang, and Bert Zwart.

 

 

Bio:

Chang-Han Rhee is an Assistant Professor in Industrial Engineering and Management Sciences at Northwestern University. Before joining Northwestern University, he was a postdoctoral researcher at Centrum Wiskunde & Informatica and Georgia Tech. He received his Ph.D. from Stanford University. His research interests include applied probability, stochastic simulation, experimental design, and the theoretical foundation of machine learning. His research has been recognized with the 2016 INFORMS Simulation Society Outstanding Publication Award, the 2012 Winter Simulation Conference Best Student Paper Award, the 2023 INFORMS George Nicholson Student Paper Competition (2nd place), and the 2013 INFORMS George Nicholson Student Paper Competition (finalist). Since 2022, his research has been supported by the NSF CAREER Award.  
 

Mon, 30 Apr 2018

15:45 - 16:45
L3

Ricci Flow, Stochastic Analysis, and Functional Inequalities on Manifolds with Time-Dependent Riemannian Metrics

ELTON HSU
(Northwestern University, USA)
Abstract

Stochastic analysis on a Riemannian manifold is a well developed area of research in probability theory.

We will discuss some recent developments on stochastic analysis on a manifold whose Riemannian metric evolves with time, a typical case of which is the Ricci flow. Familiar results such as stochastic parallel transport, integration by parts formula, martingale representation theorem, and functional inequalities have interesting extensions from

time independent metrics to time dependent ones. In particular, we will discuss an extension of Beckner’s inequality on the path space over a Riemannian manifold with time-dependent metrics. The classical version of this inequality includes the Poincare inequality and the logarithmic Sobolev inequality as special cases.

 

Thu, 26 Nov 2015

16:00 - 17:00
L3

Attributes and Artifacts of Network Optimization

Adilson E Motter
(Northwestern University, USA)
Abstract

Much of the recent interest in complex networks has been driven by the prospect that network optimization will help us understand the workings of evolutionary pressure in natural systems and the design of efficient engineered systems.  In this talk, I will reflect on unanticipated attributes and artifacts in three classes of network optimization problems. First, I will discuss implications of optimization for the metabolic activity of living cells and its role in giving rise to the recently discovered phenomenon of synthetic rescues. Then I will comment on the problem of controlling network dynamics and show that theoretical results on optimizing the number of driver nodes/variables often only offer a conservative lower bound to the number actually needed in practice. Finally, I will discuss the sensitive dependence of network dynamics on network structure that emerges in the optimization of network topology for dynamical processes governed by eigenvalue spectra, such as synchronization and consensus processes.  Optimization is a double-edged sword for which desired and adverse effects can be exacerbated in complex network systems due to the high dimensionality of their dynamics.

Thu, 04 Feb 2010

12:30 - 13:30
Gibson 1st Floor SR

Transonic shocks in divergent nozzles

Myoungjean Bae
(Northwestern University, USA)
Abstract

One of important subjects in the study of transonic flow is to understand a global structure of flow through a convergent-divergent nozzle so called a de Laval nozzle. Depending on the pressure at the exit of the de Laval nozzle, various patterns of flow may occur. As an attempt to understand such a phenomenon, we introduce a new potential flow model called 'non-isentropic potential flow system' which allows a jump of the entropy across a shock, and use this model to rigorously prove the unique existence and the stability of transonic shocks for a fixed exit pressure. This is joint work with Mikhail Feldman.

Mon, 02 Jun 2008
14:15
Oxford-Man Institute

Cameron-Martin Theorem for Riemannian Manifolds

Prof Elton Hsu
(Northwestern University, USA)
Abstract

The Cameron-Martin theorem is a fundamental result in stochastic analysis. We will show that the Wiener measure on a geometrically and stochastically complete Riemannian manifold is quasi-invariant. This is a complete a complete generalization of the classical Cameron-Martin theorem for Euclidean space to Riemannian manifolds. We do not impose any curvature growth conditions.

Tue, 11 Sep 2007
16:00
L1

On Nonlinear Partial Differential Equations of Mixed Type

Gui-Qiang Chen
(Northwestern University, USA)
Abstract
  In this talk we will discuss some recent developments in the study of nonlinear partial differential equations of mixed type, including the mixed parabolic-hyperbolic type and mixed elliptic-hyperbolic type. Examples include nonlinear degenerate diffusion-convection equations and transonic flow equations in fluid mechanics, as well as nonlinear equations of mixed type in a fluid mechanical formulation for isometric embedding problems in differential geometry. Further ideas, trends, and open problems in this direction will be also addressed.  
Subscribe to Northwestern University, USA