Past DataSig Seminar

14 October 2021
16:00
George Wynne

Further Information: 

ww.datasig.ac.uk/events

Abstract

Kernel-based statistical algorithms have found wide success in statistical machine learning in the past ten years as a non-parametric, easily computable engine for reasoning with probability measures. The main idea is to use a kernel to facilitate a mapping of probability measures, the objects of interest, into well-behaved spaces where calculations can be carried out. This methodology has found wide application, for example two-sample testing, independence testing, goodness-of-fit testing, parameter inference and MCMC thinning. Most theoretical investigations and practical applications have focused on Euclidean data. This talk will outline work that adapts the kernel-based methodology to data in an arbitrary Hilbert space which then opens the door to applications for functional data, where a single data sample is a discretely observed function, for example time series or random surfaces. Such data is becoming increasingly more prominent within the statistical community and in machine learning. Emphasis shall be given to the two-sample and goodness-of-fit testing problems.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

22 September 2021
09:00
Yuzuru Inahama

Further Information: 

Abstract

Stochastic differential equations (SDEs) on compact foliated spaces were introduced a few years ago. As a corollary, a leafwise Brownian motion on a compact foliated space was obtained as a solution to an SDE. In this work we construct stochastic flows associated with the SDEs by using rough path theory, which is something like a 'deterministic version' of Ito's SDE theory.

This is joint work with Kiyotaka Suzaki.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

8 September 2021
09:00
Abstract

Although a multidimensional data array can be very large, it may contain coherence patterns much smaller in size. For example, we may need to detect a subset of genes that co-express under a subset of conditions. In this presentation, we discuss our recently developed co-clustering algorithms for the extraction and analysis of coherent patterns in big datasets. In our method, a co-cluster, corresponding to a coherent pattern, is represented as a low-rank tensor and it can be detected from the intersection of hyperplanes in a high dimensional data space. Our method has been used successfully for DNA and protein data analysis, disease diagnosis, drug therapeutic effect assessment, and feature selection in human facial expression classification. Our method can also be useful for many other real-world data mining, image processing and pattern recognition applications.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

10 June 2021
16:00
Blanka Horvath

Further Information: 

Abstract

Techniques that address sequential data have been a central theme in machine learning research in the past years. More recently, such considerations have entered the field of finance-related ML applications in several areas where we face inherently path dependent problems: from (deep) pricing and hedging (of path-dependent options) to generative modelling of synthetic market data, which we refer to as market generation.

We revisit Deep Hedging from the perspective of the role of the data streams used for training and highlight how this perspective motivates the use of highly-accurate generative models for synthetic data generation. From this, we draw conclusions regarding the implications for risk management and model governance of these applications, in contrast to risk management in classical quantitative finance approaches.

Indeed, financial ML applications and their risk management heavily rely on a solid means of measuring and efficiently computing (similarity-)metrics between datasets consisting of sample paths of stochastic processes. Stochastic processes are at their core random variables with values on path space. However, while the distance between two (finite dimensional) distributions was historically well understood, the extension of this notion to the level of stochastic processes remained a challenge until recently. We discuss the effect of different choices of such metrics while revisiting some topics that are central to ML-augmented quantitative finance applications (such as the synthetic generation and the evaluation of similarity of data streams) from a regulatory (and model governance) perspective. Finally, we discuss the effect of considering refined metrics which respect and preserve the information structure (the filtration) of the market and the implications and relevance of such metrics on financial results.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

3 June 2021
16:00
Ismaël Bailleul

Further Information: 

Abstract

In its simplest instance, kinetic Brownian in Rd is a C1 random path (mt, vt) with unit velocity vt a Brownian motion on the unit sphere run at speed a > 0. Properly time rescaled as a function of the parameter a, its position process converges to a Brownian motion in Rd as a tends to infinity. On the other side the motion converges to the straight line motion (= geodesic motion) when a goes to 0. Kinetic Brownian motion provides thus an interpolation between geodesic and Brownian flows in this setting. Think now about changing Rd for the diffeomorphism group of a fluid domain, with a velocity vector now a vector field on the domain. I will explain how one can prove in this setting an interpolation result similar to the previous one, giving an interpolation between Euler’s equations of incompressible flows and a Brownian-like flow on the diffeomorphism group.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

13 May 2021
16:00
Richard Samworth

Further Information: 

Abstract

We introduce a new method for high-dimensional, online changepoint detection in settings where a $p$-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package 'ocd', and we also demonstrate its utility on a seismology data set.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

6 May 2021
16:00
Peter K Friz

Further Information: 

Abstract

We revisit rough paths and signatures from a geometric and "smooth model" perspective. This provides a lean framework to understand and formulate key concepts of the theory, including recent insights on higher-order translation, also known as renormalization of rough paths. This first part is joint work with C Bellingeri (TU Berlin), and S Paycha (U Potsdam). In a second part, we take a semimartingale perspective and more specifically analyze the structure of expected signatures when written in exponential form. Following Bonnier-Oberhauser (2020), we call the resulting objects signature cumulants. These can be described - and recursively computed - in a way that can be seen as unification of previously unrelated pieces of mathematics, including Magnus (1954), Lyons-Ni (2015), Gatheral and coworkers (2017 onwards) and Lacoin-Rhodes-Vargas (2019). This is joint work with P Hager and N Tapia.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

29 April 2021
16:00
Aapo Hyvärinen

Further Information: 

Abstract

Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, especially in the form of independent component analysis (ICA). However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Recently, we have shown that this problem can be solved by using additional information, in particular in the form of temporal structure or some additional observed variable. Our methods were originally based on "self-supervised" learning increasingly used in deep learning, but in more recent work, we have provided likelihood-based approaches. In particular, we have developed computational methods for efficient maximization of the likelihood for two variants of the model, based on variational inference or Riemannian relative gradients, respectively.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

21 April 2021
09:00
Abstract

Path signature has unique advantages on extracting high-order differential features of sequential data. Our team has been studying the path signature theory and actively applied it to various applications, including infant cognitive score prediction, human motion recognition, hand-written character recognition, hand-written text line recognition and writer identification etc. In this talk, I will share our most recent works on infant cognitive score prediction using deep path signature. The cognitive score can reveal individual’s abilities on intelligence, motion, language abilities. Recent research discovered that the cognitive ability is closely related with individual’s cortical structure and its development. We have proposed two frameworks to predict the cognitive score with different path signature features. For the first framework, we construct the temporal path signature along the age growth and extract signature features of developmental infant cortical features. By incorporating the cortical path signature into the multi-stream deep learning model, the individual cognitive score can be predicted with missing data issues. For the second framework, we propose deep path signature algorithm to compute the developmental feature and obtain the developmental connectivity matrix. Then we have designed the graph convolutional network for the score prediction. These two frameworks have been tested on two in-house cognitive data sets and reached the state-of-the-art results.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

25 March 2021
16:00
Fabrice Baudoin

Further Information: 

Abstract

We study several matrix diffusion processes constructed from a unitary Brownian motion. In particular, we use the Stiefel fibration to lift the Brownian motion of the complex Grass- mannian to the complex Stiefel manifold and deduce a skew-product decomposition of the Stiefel Brownian motion. As an application, we prove asymptotic laws for the determinants of the block entries of the unitary Brownian motion.

The join button will be published on the right (Above the view all button) 30 minutes before the seminar starts (login required).

Pages