Forthcoming events in this series

Wed, 29 Jun 2022

16:00 - 17:00

### Information theory with kernel methods

Francis Bach
(INRIA - Ecole Normale Supérieure)
Further Information
Abstract

I will consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. In this talk, I will show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. I will also present how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods (based on https://arxiv.org/pdf/2202.08545.pdf, see also https://francisbach.com/information-theory-with-kernel-methods/).

Thu, 26 May 2022

16:00 - 17:00
Virtual

### Tensor Product Kernels for Independence

Zoltan Szabo
(London School of Economics)
Further Information
Abstract

Hilbert-Schmidt independence criterion (HSIC) is among the most widely-used approaches in machine learning and statistics to measure the independence of random variables. Despite its popularity and success in numerous applications, quite little is known about when HSIC characterizes independence. I am going to provide a complete answer to this question, with conditions which are often easy to verify in practice.

This talk is based on joint work with Bharath Sriperumbudur.

Wed, 20 Apr 2022

09:00 - 10:00
Virtual

### Optimization, Speed-up, and Out-of-distribution Prediction in Deep Learning

Wei Chen
Further Information
Abstract

In this talk, I will introduce our investigations on how to make deep learning easier to optimize, faster to train, and more robust to out-of-distribution prediction. To be specific, we design a group-invariant optimization framework for ReLU neural networks; we compensate the gradient delay in asynchronized distributed training; and we improve the out-of-distribution prediction by incorporating “causal” invariance.

Thu, 24 Mar 2022

16:00 - 17:00
Virtual

### The Geometry of Linear Convolutional Networks

Kathlén Kohn
(KTH Royal Institute of Technology)
Further Information
Abstract

We discuss linear convolutional neural networks (LCNs) and their critical points. We observe that the function space (that is, the set of functions represented by LCNs) can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network's architecture on the geometry of the function space.

For instance, for LCNs with one-dimensional convolutions having stride one and arbitrary filter sizes, we provide a full description of the boundary of the function space. We further study the optimization of an objective function over such LCNs: We characterize the relations between critical points in function space and in parameter space and show that there do exist spurious critical points. We compute an upper bound on the number of critical points in function space using Euclidean distance degrees and describe dynamical invariants for gradient descent.

This talk is based on joint work with Thomas Merkh, Guido Montúfar, and Matthew Trager.

Thu, 10 Feb 2022

16:00 - 17:00
Virtual

### Non-Parametric Estimation of Manifolds from Noisy Data

Yariv Aizenbud
(Yale University)
Further Information
Abstract

In many data-driven applications, the data follows some geometric structure, and the goal is to recover this structure. In many cases, the observed data is noisy and the recovery task is even more challenging. A common assumption is that the data lies on a low dimensional manifold. Estimating a manifold from noisy samples has proven to be a challenging task. Indeed, even after decades of research, there was no (computationally tractable) algorithm that accurately estimates a manifold from noisy samples with a constant level of noise.

In this talk, we will present a method that estimates a manifold and its tangent. Moreover, we establish convergence rates, which are essentially as good as existing convergence rates for function estimation.

This is a joint work with Barak Sober.

Thu, 03 Feb 2022

16:00 - 17:00
Virtual

### Optimal Thinning of MCMC Output

Chris Oates
(Newcastle University)
Further Information
Abstract

The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Here we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in Python, R and MATLAB.

Thu, 27 Jan 2022

16:00 - 17:00
Virtual

### Learning Homogenized PDEs in Continuum Mechanics

Andrew Stuart
(Caltech)
Further Information
Abstract

Neural networks have shown great success at learning function approximators between spaces X and Y, in the setting where X is a finite dimensional Euclidean space and where Y is either a finite dimensional Euclidean space (regression) or a set of finite cardinality (classification); the neural networks learn the approximator from N data pairs {x_n, y_n}. In many problems arising in the physical and engineering sciences it is desirable to generalize this setting to learn operators between spaces of functions X and Y. The talk will overview recent work in this context.

Then the talk will focus on work aimed at addressing the problem of learning operators which define the constitutive model characterizing the macroscopic behaviour of multiscale materials arising in material modeling. Mathematically this corresponds to using machine learning to determine appropriate homogenized equations, using data generated at the microscopic scale. Applications to visco-elasticity and crystal-plasticity are given.

Thu, 13 Jan 2022

16:00 - 17:00
Virtual

### Regularity structures and machine learning

Ilya Chevyrev
(Edinburgh University)
Further Information
Abstract

In many machine learning tasks, it is crucial to extract low-dimensional and descriptive features from a data set. In this talk, I present a method to extract features from multi-dimensional space-time signals which is motivated, on the one hand, by the success of path signatures in machine learning, and on the other hand, by the success of models from the theory of regularity structures in the analysis of PDEs. I will present a flexible definition of a model feature vector along with numerical experiments in which we combine these features with basic supervised linear regression to predict solutions to parabolic and dispersive PDEs with a given forcing and boundary conditions. Interestingly, in the dispersive case, the prediction power relies heavily on whether the boundary conditions are appropriately included in the model. The talk is based on the following joint work with Andris Gerasimovics and Hendrik Weber: https://arxiv.org/abs/2108.05879

Wed, 12 Jan 2022

09:00 - 10:00
Virtual

### Learning and Learning to Solve PDEs

Bin Dong
(Peking University)
Further Information
Abstract

Deep learning continues to dominate machine learning and has been successful in computer vision, natural language processing, etc. Its impact has now expanded to many research areas in science and engineering. In this talk, I will mainly focus on some recent impacts of deep learning on computational mathematics. I will present our recent work on bridging deep neural networks with numerical differential equations, and how it may guide us in designing new models and algorithms for some scientific computing tasks. On the one hand, I will present some of our works on the design of interpretable data-driven models for system identification and model reduction. On the other hand, I will present our recent attempts at combining wisdom from numerical PDEs and machine learning to design data-driven solvers for PDEs and their applications in electromagnetic simulation.

Thu, 14 Oct 2021

16:00 - 17:00
Virtual

### Kernel-based Statistical Methods for Functional Data

George Wynne
(Imperial College London)
Further Information

ww.datasig.ac.uk/events

Abstract

Kernel-based statistical algorithms have found wide success in statistical machine learning in the past ten years as a non-parametric, easily computable engine for reasoning with probability measures. The main idea is to use a kernel to facilitate a mapping of probability measures, the objects of interest, into well-behaved spaces where calculations can be carried out. This methodology has found wide application, for example two-sample testing, independence testing, goodness-of-fit testing, parameter inference and MCMC thinning. Most theoretical investigations and practical applications have focused on Euclidean data. This talk will outline work that adapts the kernel-based methodology to data in an arbitrary Hilbert space which then opens the door to applications for functional data, where a single data sample is a discretely observed function, for example time series or random surfaces. Such data is becoming increasingly more prominent within the statistical community and in machine learning. Emphasis shall be given to the two-sample and goodness-of-fit testing problems.

Wed, 22 Sep 2021

09:00 - 10:00
Virtual

### Stochastic Flows and Rough Differential Equations on Foliated Spaces

Yuzuru Inahama
(Kyushu University)
Further Information
Abstract

Stochastic differential equations (SDEs) on compact foliated spaces were introduced a few years ago. As a corollary, a leafwise Brownian motion on a compact foliated space was obtained as a solution to an SDE. In this work we construct stochastic flows associated with the SDEs by using rough path theory, which is something like a 'deterministic version' of Ito's SDE theory.

This is joint work with Kiyotaka Suzaki.

Wed, 08 Sep 2021

09:00 - 10:00
Virtual

### Co-clustering Analysis of Multidimensional Big Data

Hong Yan
(City University of Hong Kong)
Further Information
Abstract

Although a multidimensional data array can be very large, it may contain coherence patterns much smaller in size. For example, we may need to detect a subset of genes that co-express under a subset of conditions. In this presentation, we discuss our recently developed co-clustering algorithms for the extraction and analysis of coherent patterns in big datasets. In our method, a co-cluster, corresponding to a coherent pattern, is represented as a low-rank tensor and it can be detected from the intersection of hyperplanes in a high dimensional data space. Our method has been used successfully for DNA and protein data analysis, disease diagnosis, drug therapeutic effect assessment, and feature selection in human facial expression classification. Our method can also be useful for many other real-world data mining, image processing and pattern recognition applications.

Thu, 10 Jun 2021

16:00 - 17:00
Virtual

### Refining Data-Driven Market Simulators and Managing their Risks

Blanka Horvath
(King's College London)
Further Information
Abstract

Techniques that address sequential data have been a central theme in machine learning research in the past years. More recently, such considerations have entered the field of finance-related ML applications in several areas where we face inherently path dependent problems: from (deep) pricing and hedging (of path-dependent options) to generative modelling of synthetic market data, which we refer to as market generation.

We revisit Deep Hedging from the perspective of the role of the data streams used for training and highlight how this perspective motivates the use of highly-accurate generative models for synthetic data generation. From this, we draw conclusions regarding the implications for risk management and model governance of these applications, in contrast to risk management in classical quantitative finance approaches.

Indeed, financial ML applications and their risk management heavily rely on a solid means of measuring and efficiently computing (similarity-)metrics between datasets consisting of sample paths of stochastic processes. Stochastic processes are at their core random variables with values on path space. However, while the distance between two (finite dimensional) distributions was historically well understood, the extension of this notion to the level of stochastic processes remained a challenge until recently. We discuss the effect of different choices of such metrics while revisiting some topics that are central to ML-augmented quantitative finance applications (such as the synthetic generation and the evaluation of similarity of data streams) from a regulatory (and model governance) perspective. Finally, we discuss the effect of considering refined metrics which respect and preserve the information structure (the filtration) of the market and the implications and relevance of such metrics on financial results.

Thu, 03 Jun 2021

16:00 - 17:00
Virtual

### Kinetic Brownian motion in the diffeomorphism group of a closed Riemannian manifold

Ismaël Bailleul
(Université de Rennes)
Further Information
Abstract

In its simplest instance, kinetic Brownian in Rd is a C1 random path (mt, vt) with unit velocity vt a Brownian motion on the unit sphere run at speed a > 0. Properly time rescaled as a function of the parameter a, its position process converges to a Brownian motion in Rd as a tends to infinity. On the other side the motion converges to the straight line motion (= geodesic motion) when a goes to 0. Kinetic Brownian motion provides thus an interpolation between geodesic and Brownian flows in this setting. Think now about changing Rd for the diffeomorphism group of a fluid domain, with a velocity vector now a vector field on the domain. I will explain how one can prove in this setting an interpolation result similar to the previous one, giving an interpolation between Euler’s equations of incompressible flows and a Brownian-like flow on the diffeomorphism group.

Thu, 13 May 2021

16:00 - 17:00
Virtual

### High-dimensional, multiscale online changepoint detection

Richard Samworth
(DPMMS University of Cambridge)
Further Information
Abstract

We introduce a new method for high-dimensional, online changepoint detection in settings where a $p$-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package 'ocd', and we also demonstrate its utility on a seismology data set.

Thu, 06 May 2021

16:00 - 17:00
Virtual

### New perspectives on rough paths, signatures and signature cumulants

Peter K Friz
(Berlin University of Technology)
Further Information
Abstract

We revisit rough paths and signatures from a geometric and "smooth model" perspective. This provides a lean framework to understand and formulate key concepts of the theory, including recent insights on higher-order translation, also known as renormalization of rough paths. This first part is joint work with C Bellingeri (TU Berlin), and S Paycha (U Potsdam). In a second part, we take a semimartingale perspective and more specifically analyze the structure of expected signatures when written in exponential form. Following Bonnier-Oberhauser (2020), we call the resulting objects signature cumulants. These can be described - and recursively computed - in a way that can be seen as unification of previously unrelated pieces of mathematics, including Magnus (1954), Lyons-Ni (2015), Gatheral and coworkers (2017 onwards) and Lacoin-Rhodes-Vargas (2019). This is joint work with P Hager and N Tapia.

Thu, 29 Apr 2021

16:00 - 17:00
Virtual

### Nonlinear Independent Component Analysis: Identifiability, Self-Supervised Learning, and Likelihood

Aapo Hyvärinen
(University of Helsinki)
Further Information
Abstract

Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, especially in the form of independent component analysis (ICA). However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Recently, we have shown that this problem can be solved by using additional information, in particular in the form of temporal structure or some additional observed variable. Our methods were originally based on "self-supervised" learning increasingly used in deep learning, but in more recent work, we have provided likelihood-based approaches. In particular, we have developed computational methods for efficient maximization of the likelihood for two variants of the model, based on variational inference or Riemannian relative gradients, respectively.

Wed, 21 Apr 2021
09:00
Virtual

### Learning developmental path signature features with deep learning framework for infant cognitive scores prediction

Xin Zhang
(South China University of Technology)
Further Information
Abstract

Path signature has unique advantages on extracting high-order differential features of sequential data. Our team has been studying the path signature theory and actively applied it to various applications, including infant cognitive score prediction, human motion recognition, hand-written character recognition, hand-written text line recognition and writer identification etc. In this talk, I will share our most recent works on infant cognitive score prediction using deep path signature. The cognitive score can reveal individual’s abilities on intelligence, motion, language abilities. Recent research discovered that the cognitive ability is closely related with individual’s cortical structure and its development. We have proposed two frameworks to predict the cognitive score with different path signature features. For the first framework, we construct the temporal path signature along the age growth and extract signature features of developmental infant cortical features. By incorporating the cortical path signature into the multi-stream deep learning model, the individual cognitive score can be predicted with missing data issues. For the second framework, we propose deep path signature algorithm to compute the developmental feature and obtain the developmental connectivity matrix. Then we have designed the graph convolutional network for the score prediction. These two frameworks have been tested on two in-house cognitive data sets and reached the state-of-the-art results.

Thu, 25 Mar 2021

16:00 - 17:00
Virtual

### Asymptotic windings of the block determinants of a unitary Brownian motion and related diffusions

Fabrice Baudoin
(University of Connecticut)
Further Information
Abstract

We study several matrix diffusion processes constructed from a unitary Brownian motion. In particular, we use the Stiefel fibration to lift the Brownian motion of the complex Grass- mannian to the complex Stiefel manifold and deduce a skew-product decomposition of the Stiefel Brownian motion. As an application, we prove asymptotic laws for the determinants of the block entries of the unitary Brownian motion.

Thu, 11 Mar 2021

16:00 - 17:00
Virtual

### Oriented areas and chain of offsets models

Yuliy Baryshnikov
(University of Illinois)
Further Information
Thu, 04 Mar 2021

16:00 - 17:00
Virtual

### Machine Learning for Partial Differential Equations

Michael Brenner
(Harvard University)
Further Information
Abstract

Our understanding and ability to compute the solutions to nonlinear partial differential equations has been strongly curtailed by our inability to effectively parameterize the inertial manifold of their solutions.  I will discuss our ongoing efforts for using machine learning to advance the state of the art, both for developing a qualitative understanding of "turbulent" solutions and for efficient computational approaches.  We aim to learn parameterizations of the solutions that give more insight into the dynamics and/or increase computational efficiency. I will discuss our recent work using machine learning to develop models of the small scale behavior of spatio-temporal complex solutions, with the goal of maintaining accuracy albeit at a highly reduced computational cost relative to a full simulation.  References: https://www.pnas.org/content/116/31/15344 and https://arxiv.org/pdf/2102.01010.pdf

Thu, 25 Feb 2021

16:00 - 17:00
Virtual

### Discrete-time signatures and randomness in reservoir computing (joint work with Christa Cuchiero, Lukas Gonon, Lyudmila Grigoryeva, Juan-Pablo Ortega)

Josef Teichmann
(ETH Zurich)
Further Information
Abstract

A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of state-space systems that generate Volterra series expansions. This procedure yields a state-affine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided.

Wed, 17 Feb 2021

09:00 - 10:00
Virtual

### Path Development and the Length Conjecture

Xi Geng
(University of Melbourne)
Further Information
Abstract

It was implicitly conjectured by Hambly-Lyons in 2010, which was made explicit by Chang-Lyons-Ni in 2018, that the length of a tree-reduced path with bounded variation can be recovered from its signature asymptotics. Apart from its intrinsic elegance, understanding such a phenomenon is also important for the study of signature lower bounds and may shed light on more general signature inversion properties. In this talk, we discuss how the idea of path development onto suitably chosen Lie groups can be used to study this problem as well as its rough path analogue.

Thu, 26 Nov 2020

16:00 - 17:00
Virtual

### On the Happy Marriage of Kernel Methods and Deep Learning

Julien Mairal
(Inria Grenoble)
Further Information

datasig.ox.ac.uk/events

Abstract

In this talk, we present simple ideas to combine nonparametric approaches based on positive definite kernels with deep learning models. There are many good reasons for bridging these two worlds. On the one hand, we want to provide regularization mechanisms and a geometric interpretation to deep learning models, as well as a functional space that allows to study their theoretical properties (eg invariance and stability). On the other hand, we want to bring more adaptivity and scalability to traditional kernel methods, which are crucially lacking. We will start this presentation by introducing models to represent graph data, then move to biological sequences, and images, showing that our hybrid models can achieves state-of-the-art results for many predictive tasks, especially when large amounts of annotated data are not available. This presentation is based on joint works with Alberto Bietti, Dexiong Chen, and Laurent Jacob.

Thu, 12 Nov 2020

16:00 - 17:00
Virtual

### Understanding Concentration and Separation in Deep Neural Networks

Stéphane Mallat
(College de France)
Further Information
Abstract

Deep convolutional networks have spectacular performances that remain mostly not understood. Numerical experiments show that they classify by progressively concentrating each class in separate regions of a low-dimensional space. To explain these properties, we introduce a concentration and separation mechanism with multiscale tight frame contractions. Applications are shown for image classification and statistical physics models of cosmological structures and turbulent fluids.