Forthcoming events in this series


Thu, 05 Dec 2024

16:00 - 17:00
Virtual

Transportation market rate forecast using signature transform

Dr Xin Guo
Further Information
Abstract

Freight transportation marketplace rates are typically challenging to forecast accurately. In this talk, I will present a novel statistical technique based on signature transforms and  a predictive and adaptive model to forecast these marketplace rates. Our technique is based on two key elements of the signature transform: one being its universal nonlinearity property, which linearizes the feature space and hence translates the forecasting problem into linear regression, and the other being the signature kernel, which allows for comparing computationally efficiently similarities between time series data. Combined, it allows for efficient feature generation and precise identification of seasonality and regime switching in the forecasting process. 

An algorithm based on our technique has been deployed by Amazon trucking operations, with far superior forecast accuracy and better interpretability versus commercially available industry models, even during the COVID-19 pandemic and the Ukraine conflict. Furthermore, our technique is in production in Amazon and has been adopted for Amazon finance planning,  with an estimated annualized saving of $50MM in the transportation sector alone. 

Thu, 04 Apr 2024

16:00 - 17:00
Virtual

Differential Equation-inspired Deep Learning for Node Classification and Spatiotemporal Forecasting

Noseong Park
Further Information
Abstract

Scientific knowledge, written in the form of differential equations, plays a vital role in various deep learning fields. In this talk, I will present a graph neural network (GNN) design based on reaction-diffusion equations, which addresses the notorious oversmoothing problem of GNNs. Since the self-attention of Transformers can also be viewed as a special case of graph processing, I will present how we can enhance Transformers in a similar way. I will also introduce a spatiotemporal forecasting model based on neural controlled differential equations (NCDEs). NCDEs were designed to process irregular time series in a continuous manner and for spatiotemporal processing, it needs to be combined with a spatial processing module, i.e., GNN. I will show how this can be done. 

Thu, 21 Mar 2024

16:00 - 17:00
Virtual

Data-driven surrogate modelling for astrophysical simulations: from stellar winds to supernovae

Jeremy Yates and Frederik De Ceuster
(University College London)
Further Information
Abstract

The feedback loop between simulations and observations is the driving force behind almost all discoveries in astronomy. However, as technological innovations allow us to create ever more complex simulations and make ever more detailed observations, it becomes increasingly difficult to combine the two: since we cannot do controlled experiments, we need to simulate whatever we can observe. This requires efficient simulation pipelines, including (general-relativistic-)(magneto-)hydrodynamics, particle physics, chemistry, and radiation transport. In this talk, we explore the challenges associated with these modelling efforts and discuss how adopting data-driven surrogate modelling and proper control over model uncertainties, promises to unlock a gold mine of future discoveries. For instance, the application to stellar wind simulations can teach us about the origin of chemistry in our Universe and the building blocks for life, while supernova simulations can reveal exotic states of matter and elucidate the formation black holes.

Thu, 15 Feb 2024

16:00 - 17:00
Virtual

From Lévy's stochastic area formula to universality of affine and polynomial processes via signature SDEs

Christa Cuchiero
(University of Vienna)
Further Information
Abstract

A plethora of stochastic models used in particular in mathematical finance, but also population genetics and physics, stems from the class of affine and polynomial processes. The history of these processes is on the one hand closely connected with the important concept of tractability, that is a substantial reduction of computational efforts due to special structural features, and on the other hand with a unifying framework for a large number of probabilistic models. One early instance in the literature where this unifying affine and polynomial point of view can be applied is Lévy's stochastic area formula. Starting from this example,  we present a guided tour through the main properties and recent results, which lead to signature stochastic differential equations (SDEs). They constitute a large class of stochastic processes, here driven by Brownian motions, whose characteristics are entire or real-analytic functions of their own signature, i.e. of iterated integrals of the process with itself, and allow therefore for a generic path dependence. We show that their prolongation with the corresponding signature is an affine and polynomial process taking values in subsets of group-like elements of the extended tensor algebra. Signature SDEs are thus a class of stochastic processes, which is universal within Itô processes with path-dependent characteristics and which allows - due to the affine theory - for a relatively explicit characterization of the Fourier-Laplace transform and hence the full law on path space.

Thu, 25 Jan 2024

16:00 - 17:00
Virtual

An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning

Anastasis Kratsios
Further Information
Abstract

We build universal approximators of continuous maps between arbitrary Polish metric spaces X and Y using universal approximators between Euclidean spaces as building blocks. Earlier results assume that the output space Y is a topological vector space. We overcome this limitation by "randomization": our approximators output discrete probability measures over Y. When X and Y are Polish without additional structure, we prove very general qualitative guarantees; when they have suitable combinatorial structure, we prove quantitative guarantees for Hölder-like maps, including maps between finite graphs, solution operators to rough differential equations between certain Carnot groups, and continuous non-linear operators between Banach spaces arising in inverse problems. In particular, we show that the required number of Dirac measures is determined by the combinatorial structure of X and Y. For barycentric Y, including Banach spaces, R-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric spaces, our approximators reduce to Y-valued functions. When the Euclidean approximators are neural networks, our constructions generalize transformer networks, providing a new probabilistic viewpoint of geometric deep learning. 

As an application, we show that the solution operator to an RDE can be approximated within our framework.

Based on the following articles: 

         An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning (2023) - Chong Liu, Matti Lassas, Maarten V. de Hoop, and Ivan Dokmanić (ArXiV 2304.12231)

         Designing universal causal deep learning models: The geometric (Hyper)transformer (2023) B. Acciaio, A. Kratsios, and G. Pammer, Math. Fin. https://onlinelibrary.wiley.com/doi/full/10.1111/mafi.12389

         Universal Approximation Under Constraints is Possible with Transformers (2022) - ICLR Spotlight - A. Kratsios, B. Zamanlooy, T. Liu, and I. Dokmanić.

 

Thu, 01 Dec 2022
16:00
Virtual

Particle filters for Data Assimilation

Dan Crisan
(Imperial College London)

Note: we would recommend to join the meeting using the Teams client for best user experience.

Further Information
Abstract

Modern Data Assimilation (DA) can be traced back to the sixties and owes a lot to earlier developments in linear filtering theory. Since then, DA has evolved independently of Filtering Theory. To-date it is a massively important area of research due to its many applications in meteorology, ocean prediction, hydrology, oil reservoir exploration, etc. The field has been largely driven by practitioners, however in recent years an increasing body of theoretical work has been devoted to it. In this talk, In my talk, I will advocate the interpretation of DA through the language of stochastic filtering. This interpretation allows us to make use of advanced particle filters to produce rigorously validated DA methodologies. I will present a particle filter that incorporates three additional add-on procedures: nudging, tempering and jittering. The particle filter is tested on a two-layer quasi-geostrophic model with O(10^6) degrees of freedom out of which only a minute fraction are noisily observed.

Thu, 24 Nov 2022
16:00
Virtual

The Legendre Memory Unit: A neural network with optimal time series compression

Chris Eliasmith
(University of Waterloo)

Note: we would recommend to join the meeting using the Teams client for best user experience.

Further Information
Abstract

We have recently proposed a new kind of neural network, called a Legendre Memory Unit (LMU) that is provably optimal for compressing streaming time series data. In this talk, I describe this network, and a variety of state-of-the-art results that have been set using the LMU. I will include recent results on speech and language applications that demonstrate significant improvements over transformers. I will discuss variants of the original LMU that permit effective scaling on current GPUs and hold promise to provide extremely efficient edge time series processing.

Thu, 03 Nov 2022
16:00
Virtual

Signatures and Functional Expansions

Bruno Dupire
(Bloomberg)

Note: we would recommend to join the meeting using the Teams client for best user experience.

Further Information
Abstract

European option payoffs can be generated by combinations of hockeystick payoffs or of monomials. Interestingly, path dependent options can be generated by combinations of signatures, which are the building blocks of path dependence. We focus on the case of 1 asset together with time, typically the evolution of the price x as a function of the time t. The signature of a path for a given word with letters in the alphabet {t,x} (sometimes called augmented signature of dimension 1) is an iterated Stratonovich integral with respect to the letters of the word and it plays the role of a monomial in a Taylor expansion. For a given time horizon T the signature elements associated to short words are contained in the linear space generated by the signature elements associated to longer words and we construct an incremental basis of signature elements. It allows writing a smooth path dependent payoff as a converging series of signature elements, a result stronger than the density property of signature elements from the Stone-Weierstrass theorem. We recall the main concepts of the Functional Itô Calculus, a natural framework to model path dependence and draw links between two approximation results, the Taylor expansion and the Wiener chaos decomposition. The Taylor expansion is obtained by iterating the functional Stratonovich formula whilst the Wiener chaos decomposition is obtained by iterating the functional Itô formula applied to a conditional expectation. We also establish the pathwise Intrinsic Expansion and link it to the Functional Taylor Expansion.

Wed, 29 Jun 2022

16:00 - 17:00

Information theory with kernel methods

Francis Bach
(INRIA - Ecole Normale Supérieure)
Further Information
Abstract

I will consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. In this talk, I will show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. I will also present how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods (based on https://arxiv.org/pdf/2202.08545.pdf, see also https://francisbach.com/information-theory-with-kernel-methods/).

Thu, 26 May 2022

16:00 - 17:00
Virtual

Tensor Product Kernels for Independence

Zoltan Szabo
(London School of Economics)
Further Information
Abstract

Hilbert-Schmidt independence criterion (HSIC) is among the most widely-used approaches in machine learning and statistics to measure the independence of random variables. Despite its popularity and success in numerous applications, quite little is known about when HSIC characterizes independence. I am going to provide a complete answer to this question, with conditions which are often easy to verify in practice.

This talk is based on joint work with Bharath Sriperumbudur.

Wed, 20 Apr 2022

09:00 - 10:00
Virtual

Optimization, Speed-up, and Out-of-distribution Prediction in Deep Learning

Wei Chen
(Chinese Academy of Sciences)
Further Information
Abstract

In this talk, I will introduce our investigations on how to make deep learning easier to optimize, faster to train, and more robust to out-of-distribution prediction. To be specific, we design a group-invariant optimization framework for ReLU neural networks; we compensate the gradient delay in asynchronized distributed training; and we improve the out-of-distribution prediction by incorporating “causal” invariance.

Thu, 24 Mar 2022

16:00 - 17:00
Virtual

The Geometry of Linear Convolutional Networks

Kathlén Kohn
(KTH Royal Institute of Technology)
Further Information
Abstract

We discuss linear convolutional neural networks (LCNs) and their critical points. We observe that the function space (that is, the set of functions represented by LCNs) can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network's architecture on the geometry of the function space.

For instance, for LCNs with one-dimensional convolutions having stride one and arbitrary filter sizes, we provide a full description of the boundary of the function space. We further study the optimization of an objective function over such LCNs: We characterize the relations between critical points in function space and in parameter space and show that there do exist spurious critical points. We compute an upper bound on the number of critical points in function space using Euclidean distance degrees and describe dynamical invariants for gradient descent.

This talk is based on joint work with Thomas Merkh, Guido Montúfar, and Matthew Trager.

Thu, 10 Feb 2022

16:00 - 17:00
Virtual

Non-Parametric Estimation of Manifolds from Noisy Data

Yariv Aizenbud
(Yale University)
Further Information
Abstract

In many data-driven applications, the data follows some geometric structure, and the goal is to recover this structure. In many cases, the observed data is noisy and the recovery task is even more challenging. A common assumption is that the data lies on a low dimensional manifold. Estimating a manifold from noisy samples has proven to be a challenging task. Indeed, even after decades of research, there was no (computationally tractable) algorithm that accurately estimates a manifold from noisy samples with a constant level of noise.

In this talk, we will present a method that estimates a manifold and its tangent. Moreover, we establish convergence rates, which are essentially as good as existing convergence rates for function estimation.

This is a joint work with Barak Sober.

Thu, 03 Feb 2022

16:00 - 17:00
Virtual

Optimal Thinning of MCMC Output

Chris Oates
(Newcastle University)
Further Information
Abstract

The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Here we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in Python, R and MATLAB.

Thu, 27 Jan 2022

16:00 - 17:00
Virtual

Learning Homogenized PDEs in Continuum Mechanics

Andrew Stuart
(Caltech)
Further Information
Abstract

Neural networks have shown great success at learning function approximators between spaces X and Y, in the setting where X is a finite dimensional Euclidean space and where Y is either a finite dimensional Euclidean space (regression) or a set of finite cardinality (classification); the neural networks learn the approximator from N data pairs {x_n, y_n}. In many problems arising in the physical and engineering sciences it is desirable to generalize this setting to learn operators between spaces of functions X and Y. The talk will overview recent work in this context.

Then the talk will focus on work aimed at addressing the problem of learning operators which define the constitutive model characterizing the macroscopic behaviour of multiscale materials arising in material modeling. Mathematically this corresponds to using machine learning to determine appropriate homogenized equations, using data generated at the microscopic scale. Applications to visco-elasticity and crystal-plasticity are given.

Thu, 13 Jan 2022

16:00 - 17:00
Virtual

Regularity structures and machine learning

Ilya Chevyrev
(Edinburgh University)
Further Information
Abstract

In many machine learning tasks, it is crucial to extract low-dimensional and descriptive features from a data set. In this talk, I present a method to extract features from multi-dimensional space-time signals which is motivated, on the one hand, by the success of path signatures in machine learning, and on the other hand, by the success of models from the theory of regularity structures in the analysis of PDEs. I will present a flexible definition of a model feature vector along with numerical experiments in which we combine these features with basic supervised linear regression to predict solutions to parabolic and dispersive PDEs with a given forcing and boundary conditions. Interestingly, in the dispersive case, the prediction power relies heavily on whether the boundary conditions are appropriately included in the model. The talk is based on the following joint work with Andris Gerasimovics and Hendrik Weber: https://arxiv.org/abs/2108.05879

Wed, 12 Jan 2022

09:00 - 10:00
Virtual

Learning and Learning to Solve PDEs

Bin Dong
(Peking University)
Further Information
Abstract

Deep learning continues to dominate machine learning and has been successful in computer vision, natural language processing, etc. Its impact has now expanded to many research areas in science and engineering. In this talk, I will mainly focus on some recent impacts of deep learning on computational mathematics. I will present our recent work on bridging deep neural networks with numerical differential equations, and how it may guide us in designing new models and algorithms for some scientific computing tasks. On the one hand, I will present some of our works on the design of interpretable data-driven models for system identification and model reduction. On the other hand, I will present our recent attempts at combining wisdom from numerical PDEs and machine learning to design data-driven solvers for PDEs and their applications in electromagnetic simulation.

Thu, 14 Oct 2021

16:00 - 17:00
Virtual

Kernel-based Statistical Methods for Functional Data

George Wynne
(Imperial College London)
Further Information

ww.datasig.ac.uk/events

Abstract

Kernel-based statistical algorithms have found wide success in statistical machine learning in the past ten years as a non-parametric, easily computable engine for reasoning with probability measures. The main idea is to use a kernel to facilitate a mapping of probability measures, the objects of interest, into well-behaved spaces where calculations can be carried out. This methodology has found wide application, for example two-sample testing, independence testing, goodness-of-fit testing, parameter inference and MCMC thinning. Most theoretical investigations and practical applications have focused on Euclidean data. This talk will outline work that adapts the kernel-based methodology to data in an arbitrary Hilbert space which then opens the door to applications for functional data, where a single data sample is a discretely observed function, for example time series or random surfaces. Such data is becoming increasingly more prominent within the statistical community and in machine learning. Emphasis shall be given to the two-sample and goodness-of-fit testing problems.

Wed, 22 Sep 2021

09:00 - 10:00
Virtual

Stochastic Flows and Rough Differential Equations on Foliated Spaces

Yuzuru Inahama
(Kyushu University)
Further Information
Abstract

Stochastic differential equations (SDEs) on compact foliated spaces were introduced a few years ago. As a corollary, a leafwise Brownian motion on a compact foliated space was obtained as a solution to an SDE. In this work we construct stochastic flows associated with the SDEs by using rough path theory, which is something like a 'deterministic version' of Ito's SDE theory.

This is joint work with Kiyotaka Suzaki.

Wed, 08 Sep 2021

09:00 - 10:00
Virtual

Co-clustering Analysis of Multidimensional Big Data

Hong Yan
(City University of Hong Kong)
Further Information
Abstract

Although a multidimensional data array can be very large, it may contain coherence patterns much smaller in size. For example, we may need to detect a subset of genes that co-express under a subset of conditions. In this presentation, we discuss our recently developed co-clustering algorithms for the extraction and analysis of coherent patterns in big datasets. In our method, a co-cluster, corresponding to a coherent pattern, is represented as a low-rank tensor and it can be detected from the intersection of hyperplanes in a high dimensional data space. Our method has been used successfully for DNA and protein data analysis, disease diagnosis, drug therapeutic effect assessment, and feature selection in human facial expression classification. Our method can also be useful for many other real-world data mining, image processing and pattern recognition applications.

Thu, 10 Jun 2021

16:00 - 17:00
Virtual

Refining Data-Driven Market Simulators and Managing their Risks

Blanka Horvath
(King's College London)
Further Information
Abstract

Techniques that address sequential data have been a central theme in machine learning research in the past years. More recently, such considerations have entered the field of finance-related ML applications in several areas where we face inherently path dependent problems: from (deep) pricing and hedging (of path-dependent options) to generative modelling of synthetic market data, which we refer to as market generation.

We revisit Deep Hedging from the perspective of the role of the data streams used for training and highlight how this perspective motivates the use of highly-accurate generative models for synthetic data generation. From this, we draw conclusions regarding the implications for risk management and model governance of these applications, in contrast to risk management in classical quantitative finance approaches.

Indeed, financial ML applications and their risk management heavily rely on a solid means of measuring and efficiently computing (similarity-)metrics between datasets consisting of sample paths of stochastic processes. Stochastic processes are at their core random variables with values on path space. However, while the distance between two (finite dimensional) distributions was historically well understood, the extension of this notion to the level of stochastic processes remained a challenge until recently. We discuss the effect of different choices of such metrics while revisiting some topics that are central to ML-augmented quantitative finance applications (such as the synthetic generation and the evaluation of similarity of data streams) from a regulatory (and model governance) perspective. Finally, we discuss the effect of considering refined metrics which respect and preserve the information structure (the filtration) of the market and the implications and relevance of such metrics on financial results.

Thu, 03 Jun 2021

16:00 - 17:00
Virtual

Kinetic Brownian motion in the diffeomorphism group of a closed Riemannian manifold

Ismaël Bailleul
(Université de Rennes)
Further Information
Abstract

In its simplest instance, kinetic Brownian in Rd is a C1 random path (mt, vt) with unit velocity vt a Brownian motion on the unit sphere run at speed a > 0. Properly time rescaled as a function of the parameter a, its position process converges to a Brownian motion in Rd as a tends to infinity. On the other side the motion converges to the straight line motion (= geodesic motion) when a goes to 0. Kinetic Brownian motion provides thus an interpolation between geodesic and Brownian flows in this setting. Think now about changing Rd for the diffeomorphism group of a fluid domain, with a velocity vector now a vector field on the domain. I will explain how one can prove in this setting an interpolation result similar to the previous one, giving an interpolation between Euler’s equations of incompressible flows and a Brownian-like flow on the diffeomorphism group.

Thu, 13 May 2021

16:00 - 17:00
Virtual

High-dimensional, multiscale online changepoint detection

Richard Samworth
(DPMMS University of Cambridge)
Further Information
Abstract

We introduce a new method for high-dimensional, online changepoint detection in settings where a $p$-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package 'ocd', and we also demonstrate its utility on a seismology data set.

Thu, 06 May 2021

16:00 - 17:00
Virtual

New perspectives on rough paths, signatures and signature cumulants

Peter K Friz
(Berlin University of Technology)
Further Information
Abstract

We revisit rough paths and signatures from a geometric and "smooth model" perspective. This provides a lean framework to understand and formulate key concepts of the theory, including recent insights on higher-order translation, also known as renormalization of rough paths. This first part is joint work with C Bellingeri (TU Berlin), and S Paycha (U Potsdam). In a second part, we take a semimartingale perspective and more specifically analyze the structure of expected signatures when written in exponential form. Following Bonnier-Oberhauser (2020), we call the resulting objects signature cumulants. These can be described - and recursively computed - in a way that can be seen as unification of previously unrelated pieces of mathematics, including Magnus (1954), Lyons-Ni (2015), Gatheral and coworkers (2017 onwards) and Lacoin-Rhodes-Vargas (2019). This is joint work with P Hager and N Tapia.

Thu, 29 Apr 2021

16:00 - 17:00
Virtual

Nonlinear Independent Component Analysis: Identifiability, Self-Supervised Learning, and Likelihood

Aapo Hyvärinen
(University of Helsinki)
Further Information
Abstract

Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, especially in the form of independent component analysis (ICA). However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Recently, we have shown that this problem can be solved by using additional information, in particular in the form of temporal structure or some additional observed variable. Our methods were originally based on "self-supervised" learning increasingly used in deep learning, but in more recent work, we have provided likelihood-based approaches. In particular, we have developed computational methods for efficient maximization of the likelihood for two variants of the model, based on variational inference or Riemannian relative gradients, respectively.

Wed, 21 Apr 2021
09:00
Virtual

Learning developmental path signature features with deep learning framework for infant cognitive scores prediction

Xin Zhang
(South China University of Technology)
Further Information
Abstract

Path signature has unique advantages on extracting high-order differential features of sequential data. Our team has been studying the path signature theory and actively applied it to various applications, including infant cognitive score prediction, human motion recognition, hand-written character recognition, hand-written text line recognition and writer identification etc. In this talk, I will share our most recent works on infant cognitive score prediction using deep path signature. The cognitive score can reveal individual’s abilities on intelligence, motion, language abilities. Recent research discovered that the cognitive ability is closely related with individual’s cortical structure and its development. We have proposed two frameworks to predict the cognitive score with different path signature features. For the first framework, we construct the temporal path signature along the age growth and extract signature features of developmental infant cortical features. By incorporating the cortical path signature into the multi-stream deep learning model, the individual cognitive score can be predicted with missing data issues. For the second framework, we propose deep path signature algorithm to compute the developmental feature and obtain the developmental connectivity matrix. Then we have designed the graph convolutional network for the score prediction. These two frameworks have been tested on two in-house cognitive data sets and reached the state-of-the-art results.

Thu, 25 Mar 2021

16:00 - 17:00
Virtual

Asymptotic windings of the block determinants of a unitary Brownian motion and related diffusions

Fabrice Baudoin
(University of Connecticut)
Further Information
Abstract

We study several matrix diffusion processes constructed from a unitary Brownian motion. In particular, we use the Stiefel fibration to lift the Brownian motion of the complex Grass- mannian to the complex Stiefel manifold and deduce a skew-product decomposition of the Stiefel Brownian motion. As an application, we prove asymptotic laws for the determinants of the block entries of the unitary Brownian motion.

Thu, 04 Mar 2021

16:00 - 17:00
Virtual

Machine Learning for Partial Differential Equations

Michael Brenner
(Harvard University)
Further Information
Abstract

Our understanding and ability to compute the solutions to nonlinear partial differential equations has been strongly curtailed by our inability to effectively parameterize the inertial manifold of their solutions.  I will discuss our ongoing efforts for using machine learning to advance the state of the art, both for developing a qualitative understanding of "turbulent" solutions and for efficient computational approaches.  We aim to learn parameterizations of the solutions that give more insight into the dynamics and/or increase computational efficiency. I will discuss our recent work using machine learning to develop models of the small scale behavior of spatio-temporal complex solutions, with the goal of maintaining accuracy albeit at a highly reduced computational cost relative to a full simulation.  References: https://www.pnas.org/content/116/31/15344 and https://arxiv.org/pdf/2102.01010.pdf 

Thu, 25 Feb 2021

16:00 - 17:00
Virtual

Discrete-time signatures and randomness in reservoir computing (joint work with Christa Cuchiero, Lukas Gonon, Lyudmila Grigoryeva, Juan-Pablo Ortega)

Josef Teichmann
(ETH Zurich)
Further Information
Abstract

A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of state-space systems that generate Volterra series expansions. This procedure yields a state-affine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided.

Wed, 17 Feb 2021

09:00 - 10:00
Virtual

Path Development and the Length Conjecture

Xi Geng
(University of Melbourne)
Further Information
Abstract

It was implicitly conjectured by Hambly-Lyons in 2010, which was made explicit by Chang-Lyons-Ni in 2018, that the length of a tree-reduced path with bounded variation can be recovered from its signature asymptotics. Apart from its intrinsic elegance, understanding such a phenomenon is also important for the study of signature lower bounds and may shed light on more general signature inversion properties. In this talk, we discuss how the idea of path development onto suitably chosen Lie groups can be used to study this problem as well as its rough path analogue.

Thu, 26 Nov 2020

16:00 - 17:00
Virtual

On the Happy Marriage of Kernel Methods and Deep Learning

Julien Mairal
(Inria Grenoble)
Further Information

datasig.ox.ac.uk/events

Abstract

In this talk, we present simple ideas to combine nonparametric approaches based on positive definite kernels with deep learning models. There are many good reasons for bridging these two worlds. On the one hand, we want to provide regularization mechanisms and a geometric interpretation to deep learning models, as well as a functional space that allows to study their theoretical properties (eg invariance and stability). On the other hand, we want to bring more adaptivity and scalability to traditional kernel methods, which are crucially lacking. We will start this presentation by introducing models to represent graph data, then move to biological sequences, and images, showing that our hybrid models can achieves state-of-the-art results for many predictive tasks, especially when large amounts of annotated data are not available. This presentation is based on joint works with Alberto Bietti, Dexiong Chen, and Laurent Jacob.

Thu, 12 Nov 2020

16:00 - 17:00
Virtual

Understanding Concentration and Separation in Deep Neural Networks

Stéphane Mallat
(College de France)
Further Information
Abstract

Deep convolutional networks have spectacular performances that remain mostly not understood. Numerical experiments show that they classify by progressively concentrating each class in separate regions of a low-dimensional space. To explain these properties, we introduce a concentration and separation mechanism with multiscale tight frame contractions. Applications are shown for image classification and statistical physics models of cosmological structures and turbulent fluids.

Wed, 04 Nov 2020

09:00 - 10:00
Virtual

Parametric estimation via MMD optimization: robustness to outliers and dependence

Pierre Alquier
(RIKEN)
Further Information
Abstract

In this talk, I will study the properties of parametric estimators based on the Maximum Mean Discrepancy (MMD) defined by Briol et al. (2019). In a first time, I will show that these estimators are universal in the i.i.d setting: even in case of misspecification, they converge to the best approximation of the distribution of the data in the model, without ANY assumption on this model. This leads to very strong robustness properties. In a second time, I will show that these results remain valid when the data is not independent, but satisfy instead a weak-dependence condition. This condition is based on a new dependence coefficient, which is itself defined thanks to the MMD. I will show through examples that this new notion of dependence is actually quite general. This talk is based on published works, and works in progress, with Badr-Eddine Chérief Abdellatif (ENSAE Paris), Mathieu Gerber (University of Bristol), Jean-David Fermanian (ENSAE Paris) and Alexis Derumigny (University of Twente):

http://arxiv.org/abs/1912.05737

http://proceedings.mlr.press/v118/cherief-abdellatif20a.html

http://arxiv.org/abs/2006.00840

https://arxiv.org/abs/2010.00408

https://cran.r-project.org/web/packages/MMDCopula/

Thu, 22 Oct 2020

14:00 - 15:00
Virtual

Classifier-based Distribution-Dissimilarities: From Maximum Mean Discrepancies to Adversarial Examples

Carl-Johann Simon-Gabriel
(ETH Zurich)
Further Information

datasig.ox.ac.uk/events

Abstract

Any binary classifier (or score-function) can be used to define a dissimilarity between two distributions of points with positive and negative labels. Actually, many well-known distribution-dissimilarities are classifier-based dissimilarities: the total variation, the KL- or JS-divergence, the Hellinger distance, etc. And many recent popular generative modelling algorithms compute or approximate these distribution-dissimilarities by explicitly training a classifier: eg GANs and their variants. After a brief introduction to these classifier-based dissimilarities, I will focus on the influence of the classifier's capacity. I will start with some theoretical considerations illustrated on maximum mean discrepancies --a weak form of total variation that has grown popular in machine learning-- and then focus on deep feed-forward networks and their vulnerability to adversarial examples. We will see that this vulnerability is already rooted in the design and capacity of our current networks, and will discuss ideas to tackle this vulnerability in future.

Thu, 01 Oct 2020

16:00 - 17:00
Virtual

Tropical time series, iterated-sums signatures and quasisymmetric functions

Joscha Diehl
(University of Greifswald)
Abstract

Driven by the need for principled extraction of features from time series, we introduce the iterated-sums signature over any commutative semiring. The case of the tropical semiring is a central, and our motivating, example, as it leads to features of (real-valued) time series that are not easily available using existing signature-type objects.

This is joint work with Kurusch Ebrahimi-Fard (NTNU Trondheim) and Nikolas Tapia (WIAS Berlin).

Thu, 17 Sep 2020

16:00 - 17:00
Virtual

On Wasserstein projections

Jose Blanchet
(Stanford University)
Abstract

We study the minimum Wasserstein distance from the empirical measure to a space of probability measures satisfying linear constraints. This statistic can naturally be used in a wide range of applications, for example, optimally choosing uncertainty sizes in distributionally robust optimization, optimal regularization, testing fairness, martingality, among many other statistical properties. We will discuss duality results which recover the celebrated Kantorovich-Rubinstein duality when the manifold is sufficiently rich and associated test statistics as the sample size increases. We illustrate how this relaxation can beat the statistical curse of dimensionality often associated to empirical Wasserstein distances.

The talk builds on joint work with S. Ghosh, Y. Kang, K. Murthy, M. Squillante, and N. Si.

Thu, 03 Sep 2020

16:00 - 17:00

Topological representation learning

Michael Moor
(ETH Zurich)
Abstract

Topological features as computed via persistent homology offer a non-parametric approach to robustly capture multi-scale connectivity information of complex datasets. This has started to gain attention in various machine learning applications. Conventionally, in topological data analysis, this method has been employed as an immutable feature descriptor in order to characterize topological properties of datasets. In this talk, however, I will explore how topological features can be directly integrated into deep learning architectures. This allows us to impose differentiable topological constraints for preserving the global structure of the data space when learning low-dimensional representations.

Thu, 06 Aug 2020

16:00 - 17:00
Virtual

Path signatures in topology, dynamics and data analysis

Vidit Nanda
(University of Oxford)
Abstract

The signature of a path in Euclidean space resides in the tensor algebra of that space; it is obtained by systematic iterated integration of the components of the given path against one another. This straightforward definition conceals a host of deep theoretical properties and impressive practical consequences. In this talk I will describe the homotopical origins of path signatures, their subsequent application to stochastic analysis, and how they facilitate efficient machine learning in topological data analysis. This last bit is joint work with Ilya Chevyrev and Harald Oberhauser.

Thu, 23 Jul 2020

16:00 - 17:00
Virtual

Artificial Neural Networks and Kernel Methods

Franck Gabriel
(Ecole Polytechnique Federale de Lausanne)
Abstract

The random initialisation of Artificial Neural Networks (ANN) allows one to describe, in the functional space, the limit of the evolution of ANN when their width tends towards infinity. Within this limit, an ANN is initially a Gaussian process and follows, during learning, a gradient descent convoluted by a kernel called the Neural Tangent Kernel.

Connecting neural networks to the well-established theory of kernel methods allows us to understand the dynamics of neural networks, their generalization capability. In practice, it helps to select appropriate architectural features of the network to be trained. In addition, it provides new tools to address the finite size setting.

Thu, 09 Jul 2020

16:00 - 17:00
Virtual

Characterising the set of (untruncated) signatures

Horatio Boedihardjo
(University of Reading)
Abstract

The concept of path signatures has been widely used in several areas of pure mathematics including in applications to data science. However, we remain unable to answer even the most basic questions about it. For instance, how to fully characterise the set of (untruncated) signatures of bounded variation paths? Can certain norms on signatures be related to the length of a path, like in Fourier isometry? In this talk, we will review some known results, explain the open problems and discuss their difficulties.

Thu, 25 Jun 2020

16:00 - 18:00
Virtual

Optimal execution with rough path signatures

Imanol Pérez Arribas
(Mathematical Institute University of Oxford)
Abstract

We present a method for obtaining approximate solutions to the problem of optimal execution, based on a signature method. The framework is general, only requiring that the price process is a geometric rough path and the price impact function is a continuous function of the trading speed. Following an approximation of the optimisation problem, we are able to calculate an optimal solution for the trading speed in the space of linear functions on a truncation of the signature of the price process. We provide strong numerical evidence illustrating the accuracy and flexibility of the approach. Our numerical investigation both examines cases where exact solutions are known, demonstrating that the method accurately approximates these solutions, and models where exact solutions are not known. In the latter case, we obtain favourable comparisons with standard execution strategies.

Thu, 04 Jun 2020
14:00
Virtual

A Mathematical Perspective of Machine Learning

Weinan E
(Princeton University)
Abstract

The heart of modern machine learning (ML) is the approximation of high dimensional functions. Traditional approaches, such as approximation by piecewise polynomials, wavelets, or other linear combinations of fixed basis functions, suffer from the curse of dimensionality (CoD). We will present a mathematical perspective of ML, focusing on the issue of CoD. We will discuss three major issues: approximation theory and error analysis of modern ML models, dynamics and qualitative behavior of gradient descent algorithms, and ML from a continuous viewpoint. We will see that at the continuous level, ML can be formulated as a series of reasonably nice variational and PDE-like problems. Modern ML models/algorithms, such as the random feature and two-layer and residual neural network models, can all be viewed as special discretizations of such continuous problems. We will also present a framework that is suited for analyzing ML models and algorithms in high dimension, and present results that are free of CoD. Finally, we will discuss the fundamental reasons that are responsible for the success of modern ML, as well as the subtleties and mysteries that still remain to be understood.

Thu, 14 May 2020
16:00
Virtual

Replica-exchange for non-convex optimization

Jing Dong
(Columbia Business School)
Abstract

Abstract: Gradient descent is known to converge quickly for convex objective functions, but it can be trapped at local minimums. On the other hand, Langevin dynamic can explore the state space and find global minimums, but in order to give accurate estimates, it needs to run with small discretization step size and weak stochastic force, which in general slows down its convergence. This work shows that these two algorithms can “collaborate” through a simple exchange mechanism, in which they swap their current positions if Langevin dynamic yields a lower objective function. This idea can be seen as the singular limit of the replica-exchange technique from the sampling literature. We show that this new algorithm converges to the global minimum linearly with high probability, assuming the objective function is strongly convex in a neighbourhood of the unique global minimum. By replacing gradients with stochastic gradients, and adding a proper threshold to the exchange mechanism, our algorithm can also be used in online settings. This is joint work with Xin Tong at National University of Singapore.

Thu, 07 May 2020
16:00
Virtual

Variational principles for fluid dynamics on rough paths

James Michael Leahy
(Imperial College)
Further Information
Abstract

We introduce constrained variational principles for fluid dynamics on rough paths. The advection of the fluid is constrained to be the sum of a vector field which represents coarse-scale motion and a rough (in time) vector field which parametrizes fine-scale motion. The rough vector field is regarded as fixed and the rough partial differential equation for the coarse-scale velocity is derived as a consequence of being a critical point of the action functional.

 

The action functional is perturbative in the sense that if the rough vector f ield is set to zero, then the corresponding variational principle agrees with the reduced (to the vector fields) Euler-Poincare variational principle introduced in Holm, Marsden and Ratiu (1998). More precisely, the Lagrangian in the action functional encodes the physics of the fluid and is a function of only the coarse-scale velocity. 

 

By parametrizing the fine-scales of fluid motion with a rough vector field, we preserve the pathwise nature of deterministic fluid dynamics and establish a flexible framework for stochastic parametrization schemes. The main benefit afforded by our approach is that the system of rough partial differential equations we derive satisfy essential conservation laws, including Kelvin’s circulation theorem. This talk is based on recent joint work with Dan Crisan, Darryl Holm, and Torstein Nilssen.

Thu, 30 Apr 2020

16:45 - 18:00
Virtual

Inverting a signature of a path

Weijun Xu
(University of Oxford)
Further Information
Abstract

Abstract: The signature of a path is a sequence of iterated coordinate integrals along the path. We aim at reconstructing a path from its signature. In the special case of lattice paths, one can obtain exact recovery based on a simple algebraic observation. For general continuously differentiable curves, we develop an explicit procedure that allows to reconstruct the path via piecewise linear approximations. The errors in the approximation can be quantified in terms of the level of signature used and modulus of continuity of the derivative of the path. The main idea is philosophically close to that for the lattice paths, and this procedure could be viewed as a significant generalisation. A key ingredient is the use of a symmetrisation procedure that separates the behaviour of the path at small and large scales.We will also discuss possible simplifications and improvements that may be potentially significant. Based on joint works with Terry Lyons, and also with Jiawei Chang, Nick Duffield and Hao Ni.

Thu, 30 Apr 2020

16:00 - 16:45
Virtual

Learning with Signatures: embedding and truncation order selection

Adeline Fermanian
(Sorbonne Université)
Further Information
Abstract

Abstract: Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. We will be concerned with a novel approach for sequential learning, called the signature method, and rooted in rough path theory. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. On the one hand, this approach relies critically on an embedding principle, which consists in representing discretely sampled data as paths, i.e., functions from [0,1] to R^d. We investigate the influence of embeddings on prediction accuracy with an in-depth study of three recent and challenging datasets. We show that a specific embedding, called lead-lag, is systematically better, whatever the dataset or algorithm used. On the other hand, in order to combine signatures with machine learning algorithms, it is necessary to truncate these infinite series. Therefore, we define an estimator of the truncation order and prove its convergence in the expected signature model.