A central task in modeling, which has to be performed each day in banks and financial institutions, is to calibrate models to market and historical data. So far the choice which models should be used was not only driven by their capacity of capturing empirically the observed market features well, but rather by computational tractability considerations. Due to recent work in the context of machine learning, this notion of tractability has changed significantly. In this work, we show how a neural network approach can be applied to the calibration of (multivariate) local stochastic volatility models. We will see how an efficient calibration is possible without the need of interpolation methods for the financial data. Joint work with Christa Cuchiero and Josef Teichmann.

# Past Data Science Seminar

Data analysis techniques are often highly domain specific - there are often certain patterns which should be in certain types of data but may not be apparent in data. The first part of the talk will cover a technique for finding such patterns through a tool which combines visual analytics and machine learning to provide insight into temporal multivariate data. The second half of the talk will discuss recent work on imposing high level geometric structure into continuous optimizations including deep neural networks.

Dimension reduction is an overarching theme in data science: we enjoy finding informative patterns, features or substructures in large, complex data sets. Within the field of network science, an important problem of this nature is to identify core-periphery structure. Given a network, our task is to assign each node to either the core or periphery. Core nodes should be strongly connected across the whole network whereas peripheral nodes should be strongly connected only to core nodes. More generally, we may wish to assign a non-negative value to each node, with a larger value indicating greater "coreness." This type of problem is related to, but distinct from, commumnity detection (finding clusters) and centrality assignment (finding key players), and it arises naturally in the study of networks in social science and finance. We derive and analyse a new iterative algorithm for detecting network core-periphery structure.

Using techniques in nonlinear Perron-Frobenius theory we prove global convergence to the unique solution of a relaxed version of a natural discrete optimization problem. On sparse networks, the cost of each iteration scales linearly with the number of nodes, making the algorithm feasible for large-scale problems. We give an alternative interpretation of the algorithm from the perspective of maximum likelihood reordering of a new logistic core--periphery random graph model. This viewpoint also gives a new basis for quantitatively judging a core--periphery detection algorithm. We illustrate the algorithm on a range of synthetic and real networks, and show that it offers advantages over the current state-of-the-art.

This is joint work with Francesco Tudisco (Strathclyde)

The problem of decomposing a given dataset as a superposition of basic motifs arises in a wide range of application areas, including neural spike sorting and the analysis of astrophysical and microscopy data. Motivated by these problems, we study a "short-and-sparse" deconvolution problem, in which the goal is to recover a short motif a from its convolution with a random spike train $x$. We formulate this problem as optimization over the sphere. We analyze the geometry of this (nonconvex) optimization problem, and argue that when the target spike train is sufficiently sparse, on a region of the sphere, every local minimum is equivalent to the ground truth, up to symmetry (here a signed shift). This characterization obtains, e.g., for generic kernels of length $k$, when the sparsity rate of the spike train is proportional to $k^{-2/3}$ (i.e., roughly $k^{1/3}$ spikes in each length-$k$ window). This geometric characterization implies that efficient methods obtain the ground truth under the same conditions.

Our analysis highlights the key roles of symmetry and negative curvature in the behavior of efficient methods -- in particular, the role of a "dispersive" structure in promoting efficient convergence to global optimizers without the need to explicitly leverage second-order information. We sketch connections to broader families of benign nonconvex problems in machine learning and signal processing, in which efficient methods obtain global optima independent of initialization. These problems include variants of sparse dictionary learning, tensor decomposition, and phase recovery.

Joint work with Yuqian Zhang, Yenson Lau, Han-Wen Kuo, Dar Gilboa, Sky Cheung, Abhay Pasupathy

A defining feature of robotics today is the use of learning and autonomy in the inner loop of systems that are actually being deployed in the real world, e.g., in autonomous driving or medical robotics. While it is clear that useful autonomous systems must learn to cope with a dynamic environment, requiring architectures that address the richness of the worlds in which such robots must operate, it is also equally clear that ensuring the safety of such systems is the single biggest obstacle preventing scaling up of these solutions. I will discuss an approach to system design that aims at addressing this problem by incorporating programmatic structure in the network architectures being used for policy learning. I will discuss results from two projects in this direction.

Firstly, I will present the perceptor gradients algorithm – a novel approach to learning symbolic representations based on the idea of decomposing an agent’s policy into i) a perceptor network extracting symbols from raw observation data and ii) a task encoding program which maps the input symbols to output actions. We show that the proposed algorithm is able to learn representations that can be directly fed into a Linear-Quadratic Regulator (LQR) or a general purpose A* planner. Our experimental results confirm that the perceptor gradients algorithm is able to efficiently learn transferable symbolic representations as well as generate new observations according to a semantically meaningful specification.

Next, I will describe work on learning from demonstration where the task representation is that of hybrid control systems, with emphasis on extracting models that are explicitly verifi able and easily interpreted by robot operators. Through an architecture that goes from the sensorimotor level involving fitting a sequence of controllers using sequential importance sampling under a generative switching proportional controller task model, to higher level modules that are able to induce a program for a visuomotor reaching task involving loops and conditionals from a single demonstration, we show how a robot can learn tasks such as tower building in a manner that is interpretable and eventually verifiable.

References:

1. S.V. Penkov, S. Ramamoorthy, Learning programmatically structured representations with preceptor gradients, In Proc. International Conference on Learning Representations (ICLR), 2019. http://rad.inf.ed.ac.uk/data/publications/2019/penkov2019learning.pdf

2. M. Burke, S.V. Penkov, S. Ramamoorthy, From explanation to synthesis: Compositional program induction for learning from demonstration, https://arxiv.org/abs/1902.10657

Deep generative models provide powerful tools for fitting difficult distributions such as modelling natural images. But many of these methods, including variational autoencoders (VAEs) and generative adversarial networks (GANs), can be notoriously difficult to fit.

One well-known problem is mode collapse, which means that models can learn to characterize only a few modes of the true distribution. To address this, we introduce VEEGAN, which features a reconstructor network, reversing the action of the generator by mapping from data to noise. Our training objective retains the original asymptotic consistency guarantee of GANs, and can be interpreted as a novel autoencoder loss over the noise.

Second, maximum mean discrepancy networks (MMD-nets) avoid some of the pathologies of GANs, but have not been able to match their performance. We present a new method of training MMD-nets, based on mapping the data into a lower dimensional space, in which MMD training can be more effective. We call these networks Ratio-based MMD Nets, and show that somewhat mysteriously, they have dramatically better performance than MMD nets.

A final problem is deciding how many latent components are necessary for a deep generative model to fit a certain data set. We present a nonparametric Bayesian approach to this problem, based on defining a (potentially) infinitely wide deep generative model. Fitting this model is possible by combining variational inference with a Monte Carlo method from statistical physics called Russian roulette sampling. Perhaps surprisingly, we find that this modification helps with the mode collapse problem as well.

Generative adversarial networks (GANs) use neural networks as generative models, creating realistic samples that mimic real-life reference samples (for instance, images of faces, bedrooms, and more). These networks require an adaptive critic function while training, to teach the networks how to move improve their samples to better match the reference data. I will describe a kernel divergence measure, the maximum mean discrepancy, which represents one such critic function. With gradient regularisation, the MMD is used to obtain current state-of-the art performance on challenging image generation tasks, including 160 × 160 CelebA and 64 × 64 ImageNet. In addition to adversarial network training, I'll discuss issues of gradient bias for GANs based on integral probability metrics, and mechanisms for benchmarking GAN performance.

Data science has become a topic of great interest lately and has triggered new widescale research activities around efficientl first order methods for optimisation and Bayesian sampling. The National Physical Laboratory is addressing some of these challenges with particular focus on robustness and confidence in the solution. In this talk, I will present some problems and recent results concerning i. robust learning in the presence of outliers based on the Median of Means (MoM) principle and ii. stability of the solution in super-resolution (joint work with A. Thompson and B. Toader).

Landmark-based human action recognition in videos is a challenging task in computer vision. One crucial step is to design discriminative features for spatial structure and temporal dynamics. To this end, we use and refine the path signature as an expressive, robust, nonlinear, and interpretable representation for landmark-based streamed data. Instead of extracting signature features from raw sequences, we propose path disintegrations and transformations as preprocessing to improve the efficiency and effectiveness of signature features. The path disintegrations spatially localize a pose into a collection of m-node paths from which the signatures encode non-local and non-linear geometrical dependencies, while temporally transform the evolutions of spatial features into hierarchical spatio-temporal paths from which the signatures encode long short-term dynamical dependencies. The path transformations allow the signatures to further explore correlations among different informative clues. Finally, all features are concatenated to constitute the input vector of a linear fully-connected network for action recognition. Experimental results on four benchmark datasets demonstrated that the proposed feature sets with only linear network achieves comparable state-of-the-art result to the cutting-edge deep learning methods.

In the past decade, deep learning methods have achieved unprecedented performance on a broad range of problems in various fields from computer vision to speech recognition. So far research has mainly focused on developing deep learning methods for Euclidean-structured data. However, many important applications have to deal with non-Euclidean structured data, such as graphs and manifolds. Such data are becoming increasingly important in computer graphics and 3D vision, sensor networks, drug design, biomedicine, high energy physics, recommendation systems, and social media analysis. The adoption of deep learning in these fields has been lagging behind until recently, primarily since the non-Euclidean nature of objects dealt with makes the very definition of basic operations used in deep networks rather elusive. In this talk, I will introduce the emerging field of geometric deep learning on graphs and manifolds, overview existing solutions and outline the key difficulties and future research directions. As examples of applications, I will show problems from the domains of computer vision, graphics, high-energy physics, and fake news detection.