Mon, 26 Feb 2024

14:00 - 15:00
Lecture Room 3

Fantastic Sparse Neural Networks and Where to Find Them

Dr Shiwei Liu
(Maths Institute University of Oxford)
Abstract

Sparse neural networks, where a substantial portion of the components are eliminated, have widely shown their versatility in model compression, robustness improvement, and overfitting mitigation. However, traditional methods for obtaining such sparse networks usually involve a fully pre-trained, dense model. As foundation models become prevailing, the cost of this pre-training step can be prohibitive. On the other hand, training intrinsic sparse neural networks from scratch usually leads to inferior performance compared to their dense counterpart. 

 

In this talk, I will present a series of approaches to obtain such fantastic sparse neural networks by training from scratch without the need for any dense pre-training steps, including dynamic sparse training, static sparse with random pruning, and only masking no training. First, I will introduce the concept of in-time over-parameterization (ITOP) (ICML2021) which enables training sparse neural networks from scratch (commonly known as sparse training) to attain the full accuracy of dense models. By dynamically exploring new sparse topologies during training, we avoid the costly necessity of pre-training and re-training, requiring only a single training run to obtain strong sparse neural networks. Secondly, ITOP involves additional overhead due to the frequent change in sparse topology. Our following work (ICLR2022) demonstrates that even a naïve, static sparse network produced by random pruning can be trained to achieve dense model performance as long as our model is relatively larger. Moreover, I will further discuss that we can continue to push the extreme of training efficiency by only learning masks at initialization without any weight updates, addressing the over-smoothing challenge in building deep graph neural networks (LoG2022).

Mon, 12 Feb 2024

14:00 - 15:00
Lecture Room 3

Do Stochastic, Feel Noiseless: Stable Optimization via a Double Momentum Mechanism

Kfir Levy
(Technion – Israel Institute of Technology)
Abstract

The tremendous success of the Machine Learning paradigm heavily relies on the development of powerful optimization methods, and the canonical algorithm for training learning models is SGD (Stochastic Gradient Descent). Nevertheless, the latter is quite different from Gradient Descent (GD) which is its noiseless counterpart. Concretely, SGD requires a careful choice of the learning rate, which relies on the properties of the noise as well as the quality of initialization.

 It further requires the use of a test set to estimate the generalization error throughout its run. In this talk, we will present a new SGD variant that obtains the same optimal rates as SGD, while using noiseless machinery as in GD. Concretely, it enables to use the same fixed learning rate as GD and does not require to employ a test/validation set. Curiously, our results rely on a novel gradient estimate that combines two recent mechanisms which are related to the notion of momentum.

Finally, as much as time permits, I will discuss several applications where our method can be extended.

Mon, 05 Feb 2024

14:00 - 15:00
Lecture Room 3

Exploiting Symmetries for Learning in Deep Weight Spaces

Haggai Maron
(NVIDIA)
Abstract

Learning to process and analyze the raw weight matrices of neural networks is an emerging research area with intriguing potential applications like editing and analyzing Implicit Neural Representations (INRs), weight pruning/quantization, and function editing. However, weight spaces have inherent permutation symmetries – permutations can be applied to the weights of an architecture, yielding new weights that represent the same function. As with other data types like graphs and point clouds, these symmetries make learning in weight spaces challenging.

This talk will overview recent advances in designing architectures that can effectively operate on weight spaces while respecting their underlying symmetries. First, we will discuss our ICML 2023 paper which introduces novel equivariant architectures for learning on multilayer perceptron weight spaces. We first characterize all linear equivariant layers for their symmetries and then construct networks composed of these layers. We then turn to our ICLR 2024 work, which generalizes the approach to diverse network architectures using what we term Graph Metanetworks (GMN). This is done by representing input networks as graphs and processing them with graph neural networks. We show the resulting metanetworks are expressive and equivariant to weight space symmetries of the architecture being processed. Our graph metanetworks are applicable to CNNs, attention layers, normalization layers, and more. Together, these works make promising steps toward versatile and principled architectures for weight-space learning.

Computational modeling of angiogenesis: the importance of cell rearrangements during vascular growth
Stepanova, D Byrne, H Maini, P Alarcón, T WIREs: Mechanisms of Disease (12 Dec 2023)
Mon, 22 Jan 2024

14:00 - 15:00
Lecture Room 3

Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences

Prof. Justin Sirignano
(Mathematical Institute University of Oxford)
Abstract

Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. 
The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude O(1/N ) and the number of updates is O(N). Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as N → ∞. However, the RNN hidden layer updates are O(1). Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory state, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods allow us to prove a neural tangent kernel (NTK) limit for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.

Mon, 29 Jan 2024
15:30
Lecture room 5

A rigorous approach to the Dean-Kawasaki equation of fluctuating hydrodynamics

Professor Julian Fischer
(Institute of Science and Technology Austria)
Abstract

Fluctuating hydrodynamics provides a framework for approximating density fluctuations in interacting particle systems by suitable SPDEs. The Dean-Kawasaki equation - a strongly singular SPDE - is perhaps the most basic equation of fluctuating hydrodynamics; it has been proposed in the physics literature to describe the fluctuations of the density of N diffusing weakly interacting particles in the regime of large particle numbers N. The strongly singular nature of the Dean-Kawasaki equation presents a substantial challenge for both its analysis and its rigorous mathematical justification: Besides being non-renormalizable by approaches like regularity structures, it has recently been shown to not even admit nontrivial martingale solutions.

In this talk, we give an overview of recent quantitative results for the justification of fluctuating hydrodynamics models. In particular, we give an interpretation of the Dean-Kawasaki equation as a "recipe" for accurate and efficient numerical simulations of the density fluctuations for weakly interacting diffusing particles, allowing for an error that is of arbitarily high order in the inverse particle number. 

Based on joint works with Federico Cornalba, Jonas Ingmanns, and Claudia Raithel

Polynomial bounds for chromatic number. V. Excluding a tree of radius two and a complete multipartite graph
Scott, A Seymour, P Journal of Combinatorial Theory Series B volume 164 473-491 (Jan 2024)
Rolls-Royce Hydra CFD code for gas turbine engine design
Giles, M Lapworth, L More UK Success Stories in Industrial Mathematics 65-70 (23 Apr 2025)
Multiple inflation and the WMAP 'glitches' II. Data analysis and cosmological parameter extraction
Hunt, P Sarkar, S (18 Jun 2007)
Is the evidence for dark energy secure?
Sarkar, S (28 Oct 2007)
Subscribe to