Past Data Science Seminar | Mathematical Institute

Fri, 19 Feb 2021

12:00 - 13:00

The Unlimited Sampling Approach to Computational Sensing and Imaging

Ayush Bhandari

(Imperial College, London)

Abstract

Digital data capture is the backbone of all modern day systems and “Digital Revolution” has been aptly termed as the Third Industrial Revolution. Underpinning the digital representation is the Shannon-Nyquist sampling theorem and more recent developments include compressive sensing approaches. The fact that there is a physical limit to which sensors can measure amplitudes poses a fundamental bottleneck when it comes to leveraging the performance guaranteed by recovery algorithms. In practice, whenever a physical signal exceeds the maximum recordable range, the sensor saturates, resulting in permanent information loss. Examples include (a) dosimeter saturation during the Chernobyl reactor accident, reporting radiation levels far lower than the true value and (b) loss of visual cues in self-driving cars coming out of a tunnel (due to sudden exposure to light).

To reconcile this gap between theory and practice, we introduce the Unlimited Sensing framework or the USF that is based on a co-design of hardware and algorithms. On the hardware front, our work is based on a radically different analog-to-digital converter (ADC) design, which allows for the ADCs to produce modulo or folded samples. On the algorithms front, we develop new, mathematically guaranteed recovery strategies.

In the first part of this talk, we prove a sampling theorem akin to the Shannon-Nyquist criterion. We show that, remarkably, despite the non-linearity in sensing pipeline, the sampling rate only depends on the signal’s bandwidth. Our theory is complemented with a stable recovery algorithm. Beyond the theoretical results, we will also present a hardware demo that shows our approach in action.

Moving further, we reinterpret the unlimited sensing framework as a generalized linear model that motivates a new class of inverse problems. We conclude this talk by presenting new results in the context of single-shot high-dynamic-range (HDR) imaging, sensor array processing and HDR tomography based on the modulo Radon transform.

Fri, 20 Nov 2020

12:00 - 13:00

Selection Dynamics for Deep Neural Networks

Peter Markowich

(KAUST)

Abstract

We present a partial differential equation framework for deep residual neural networks and for the associated learning problem. This is done by carrying out the continuum limits of neural networks with respect to width and depth. We study the wellposedness, the large time solution behavior, and the characterization of the steady states of the forward problem. Several useful time-uniform estimates and stability/instability conditions are presented. We state and prove optimality conditions for the inverse deep learning problem, using standard variational calculus, the Hamilton-Jacobi-Bellmann equation and the Pontryagin maximum principle. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between neural networks, PDE theory, variational analysis, optimal control, and deep learning.

This is based on joint work with Hailiang Liu.

Fri, 13 Nov 2020

12:00 - 13:00

Computational Hardness of Hypothesis Testing and Quiet Plantings

Afonso Bandeira

(ETH Zurich)

Abstract

When faced with a data analysis, learning, or statistical inference problem, the amount and quality of data available fundamentally determines whether such tasks can be performed with certain levels of accuracy. With the growing size of datasets however, it is crucial not only that the underlying statistical task is possible, but also that is doable by means of efficient algorithms. In this talk we will discuss methods aiming to establish limits of when statistical tasks are possible with computationally efficient methods or when there is a fundamental «Statistical-to-Computational gap›› in which an inference task is statistically possible but inherently computationally hard. We will focus on Hypothesis Testing and the ``Low Degree Method'' and also address hardness of certification via ``quiet plantings''. Guiding examples will include Sparse PCA, bounds on the Sherrington Kirkpatrick Hamiltonian, and lower bounds on Chromatic Numbers of random graphs.

Fri, 06 Nov 2020

12:00 - 13:00

Bridging GANs and Stochastic Analysis

Haoyang Cao

(Alan Turing Institute)

Abstract

Generative adversarial networks (GANs) have enjoyed tremendous success in image generation and processing, and have recently attracted growing interests in other fields of applications. In this talk we will start from analyzing the connection between GANs and mean field games (MFGs) as well as optimal transport (OT). We will first show a conceptual connection between GANs and MFGs: MFGs have the structure of GANs, and GANs are MFGs under the Pareto Optimality criterion. Interpreting MFGs as GANs, on one hand, will enable a GANs-based algorithm (MFGANs) to solve MFGs: one neural network (NN) for the backward Hamilton-Jacobi-Bellman (HJB) equation and one NN for the Fokker-Planck (FP) equation, with the two NNs trained in an adversarial way. Viewing GANs as MFGs, on the other hand, will reveal a new and probabilistic aspect of GANs. This new perspective, moreover, will lead to an analytical connection between GANs and Optimal Transport (OT) problems, and sufficient conditions for the minimax games of GANs to be reformulated in the framework of OT. Building up from the probabilistic views of GANs, we will then establish the approximation of GANs training via stochastic differential equations and demonstrate the convergence of GANs training via invariant measures of SDEs under proper conditions. This stochastic analysis for GANs training can serve as an analytical tool to study its evolution and stability.

Fri, 30 Oct 2020

12:00 - 13:00

Neural differential equations in machine learning

Patrick Kidger

(Oxford Mathematics)

Abstract

Differential equations and neural networks are two of the most widespread modelling paradigms. I will talk about how to combine the best of both worlds through neural differential equations. These treat differential equations as a learnt component of a differentiable computation graph, and as such integrates tightly with current machine learning practice. Applications are widespread. I will begin with an introduction to the theory of neural ordinary differential equations, which may for example be used to model unknown physics. I will then move on to discussing recent work on neural controlled differential equations, which are state-of-the-art models for (arbitrarily irregular) time series. Next will be some discussion of neural stochastic differential equations: we will see that the mathematics of SDEs is precisely aligned with the machine learning of GANs, and thus NSDEs may be used as generative models. If time allows I will then discuss other recent work, such as how the training of neural differential equations may be sped up by ~40% by tweaking standard numerical solvers to respect the particular nature of the differential equations. This is joint work with Ricky T. Q. Chen, Xuechen Li, James Foster, and James Morrill.

Fri, 16 Oct 2020

12:00 - 13:00

Advances in Topology-Based Graph Classification

Bastian Rieck

(ETH Zurich)

Abstract

Topological data analysis has proven to be an effective tool in machine learning, supporting the analysis of neural networks, but also driving the development of new algorithms that make use of topological features. Graph classification is of particular interest here, since graphs are inherently amenable to a topological description in terms of their connected components and cycles. This talk will briefly summarise recent advances in topology-based graph classification, focussing equally on ’shallow’ and ‘deep’ approaches. Starting from an intuitive description of persistent homology, we will discuss how to incorporate topological features into the Weisfeiler–Lehman colour refinement scheme, thus obtaining a simple feature-based graph classification algorithm. We will then build a bridge to graph neural networks and demonstrate a topological variant of ‘readout’ functions, which can be learned in an end-to-end fashion. Care has been taken to make the talk accessible to an audience that might not have been exposed to machine learning or topological data analysis.

Fri, 13 Mar 2020

12:00 - 13:00

Cancelled: Training Linear Neural Networks: Non-Local Convergence and Complexity Results

Armin Eftekhari

(Umea University)

Abstract

This talk has been cancelled.

Fri, 14 Feb 2020

12:00 - 13:00

Adaptive Gradient Descent without Descent

Konstantin Mischenko

(King Abdullah University of Science and Technology (KAUST))

Abstract

We show that two rules are sufficient to automate gradient descent: 1) don't increase the stepsize too fast and 2) don't overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on smoothness in a neighborhood of a solution. Given that the problem is convex, our method will converge even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including matrix factorization and training of ResNet-18.

Fri, 31 Jan 2020

12:00 - 13:00

Geometric methods on low-rank matrix and tensor manifolds

Bart Vandereycken

(Université de Genève)

Abstract

I will present numerical methods for low-rank matrix and tensor problems that explicitly make use of the geometry of rank constrained matrix and tensor spaces. We focus on two types of problems: The first are optimization problems, like matrix and tensor completion, solving linear systems and eigenvalue problems. Such problems can be solved by numerical optimization for manifolds, called Riemannian optimization methods. We will explain the basic elements of differential geometry in order to apply such methods efficiently to rank constrained matrix and tensor spaces. The second type of problem is ordinary differential equations, defined on matrix and tensor spaces. We show how their solution can be approximated by the dynamical low-rank principle, and discuss several numerical integrators that rely in an essential way on geometric properties that are characteristic to sets of low rank matrices and tensors. Based on joint work with André Uschmajew (MPI MiS Leipzig).

Fri, 24 Jan 2020

12:00 - 13:00

Tensor methods in optimization

Geovani Grapiglia

(Universidade Federal do Paraná)

Abstract

In this talk we present p-order methods for unconstrained minimization of convex functions that are p-times differentiable with Hölder continuous p-th derivatives. We establish worst-case complexity bounds for methods with and without acceleration. Some of these methods are "universal", that is, they do not require prior knowledge of the constants that define the smoothness level of the objective function. A lower complexity bound for this problem class is also obtained. This is a joint work with Yurii Nesterov (Université Catholique de Louvain).

Fri, 08 Nov 2019

12:00 - 13:00

Algebra, Geometry and Topology of ERK Enzyme Kinetics

Heather Harrington

(Mathematical Institute (University of Oxford))

Abstract

In this talk I will analyse ERK time course data by developing mathematical models of enzyme kinetics. I will present how we can use differential algebra and geometry for model identifiability, and topological data analysis to study these the dynamics of ERK. This work is joint with Lewis Marsh, Emilie Dufresne, Helen Byrne and Stanislav Shvartsman.

Fri, 18 Oct 2019

12:00 - 13:00

DPM: A deep learning algorithm for estimating PDE models from data

Justin Sirignano

(The University of Illinois at Urbana-Champaign)

Abstract

Machine learning for scientific applications faces the challenge of limited data. To reduce overfitting, we propose a framework to leverage as much as possible a priori-known physics for a problem. Our approach embeds a deep neural network in a partial differential equation (PDE) system, where the pre-specified terms in the PDE capture the known physics and the neural network will learn to describe the unknown physics. The neural network is estimated from experimental and/or high-fidelity numerical datasets. We call this approach a “deep learning PDE model” (DPM). Once trained, the DPM can be used to make out-of-sample predictions for new physical coefficients, geometries, and boundary conditions. We implement our approach on a classic problem of the Navier-Stokes equation for turbulent flows. The numerical solution of the Navier-Stokes equation (with turbulence) is computationally expensive and requires a supercomputer. We show that our approach can estimate a (computationally fast) DPM for the filtered velocity of the Navier-Stokes equations.

Fri, 14 Jun 2019

12:00 - 13:00

A neural network approach to SLV Calibration

Wahid Khosrawi

(ETH Zurich)

Abstract

A central task in modeling, which has to be performed each day in banks and financial institutions, is to calibrate models to market and historical data. So far the choice which models should be used was not only driven by their capacity of capturing empirically the observed market features well, but rather by computational tractability considerations. Due to recent work in the context of machine learning, this notion of tractability has changed significantly. In this work, we show how a neural network approach can be applied to the calibration of (multivariate) local stochastic volatility models. We will see how an efficient calibration is possible without the need of interpolation methods for the financial data. Joint work with Christa Cuchiero and Josef Teichmann.

Fri, 07 Jun 2019

12:00 - 13:00

Finding and Imposing Qualitative Properties in Data

Primoz Skraba

(Queen Mary University of London)

Abstract

Data analysis techniques are often highly domain specific - there are often certain patterns which should be in certain types of data but may not be apparent in data. The first part of the talk will cover a technique for finding such patterns through a tool which combines visual analytics and machine learning to provide insight into temporal multivariate data. The second half of the talk will discuss recent work on imposing high level geometric structure into continuous optimizations including deep neural networks.

Fri, 31 May 2019

12:00 - 13:00

A Nonlinear Spectral Method for Network Core-Periphery Detection

Desmond Higham

(University of Edinburgh)

Abstract

Dimension reduction is an overarching theme in data science: we enjoy finding informative patterns, features or substructures in large, complex data sets. Within the field of network science, an important problem of this nature is to identify core-periphery structure. Given a network, our task is to assign each node to either the core or periphery. Core nodes should be strongly connected across the whole network whereas peripheral nodes should be strongly connected only to core nodes. More generally, we may wish to assign a non-negative value to each node, with a larger value indicating greater "coreness." This type of problem is related to, but distinct from, commumnity detection (finding clusters) and centrality assignment (finding key players), and it arises naturally in the study of networks in social science and finance. We derive and analyse a new iterative algorithm for detecting network core-periphery structure.

Using techniques in nonlinear Perron-Frobenius theory we prove global convergence to the unique solution of a relaxed version of a natural discrete optimization problem. On sparse networks, the cost of each iteration scales linearly with the number of nodes, making the algorithm feasible for large-scale problems. We give an alternative interpretation of the algorithm from the perspective of maximum likelihood reordering of a new logistic core--periphery random graph model. This viewpoint also gives a new basis for quantitatively judging a core--periphery detection algorithm. We illustrate the algorithm on a range of synthetic and real networks, and show that it offers advantages over the current state-of-the-art.

This is joint work with Francesco Tudisco (Strathclyde)

Fri, 10 May 2019

12:00 - 13:00

Nonconvex Sparse Deconvolution: Global Optima and Efficient Methods

John Wright

(Columbia University)

Abstract

The problem of decomposing a given dataset as a superposition of basic motifs arises in a wide range of application areas, including neural spike sorting and the analysis of astrophysical and microscopy data. Motivated by these problems, we study a "short-and-sparse" deconvolution problem, in which the goal is to recover a short motif a from its convolution with a random spike train $x$. We formulate this problem as optimization over the sphere. We analyze the geometry of this (nonconvex) optimization problem, and argue that when the target spike train is sufficiently sparse, on a region of the sphere, every local minimum is equivalent to the ground truth, up to symmetry (here a signed shift). This characterization obtains, e.g., for generic kernels of length $k$, when the sparsity rate of the spike train is proportional to $k^{-2/3}$ (i.e., roughly $k^{1/3}$ spikes in each length-$k$ window). This geometric characterization implies that efficient methods obtain the ground truth under the same conditions.

Our analysis highlights the key roles of symmetry and negative curvature in the behavior of efficient methods -- in particular, the role of a "dispersive" structure in promoting efficient convergence to global optimizers without the need to explicitly leverage second-order information. We sketch connections to broader families of benign nonconvex problems in machine learning and signal processing, in which efficient methods obtain global optima independent of initialization. These problems include variants of sparse dictionary learning, tensor decomposition, and phase recovery.

Joint work with Yuqian Zhang, Yenson Lau, Han-Wen Kuo, Dar Gilboa, Sky Cheung, Abhay Pasupathy

Fri, 08 Mar 2019

12:00 - 13:00

Programmatically Structured Representations for Robust Autonomy in Robots

Subramanian Ramamoorthy

(University of Edinburgh and FiveAI)

Abstract

A defining feature of robotics today is the use of learning and autonomy in the inner loop of systems that are actually being deployed in the real world, e.g., in autonomous driving or medical robotics. While it is clear that useful autonomous systems must learn to cope with a dynamic environment, requiring architectures that address the richness of the worlds in which such robots must operate, it is also equally clear that ensuring the safety of such systems is the single biggest obstacle preventing scaling up of these solutions. I will discuss an approach to system design that aims at addressing this problem by incorporating programmatic structure in the network architectures being used for policy learning. I will discuss results from two projects in this direction.

Firstly, I will present the perceptor gradients algorithm – a novel approach to learning symbolic representations based on the idea of decomposing an agent’s policy into i) a perceptor network extracting symbols from raw observation data and ii) a task encoding program which maps the input symbols to output actions. We show that the proposed algorithm is able to learn representations that can be directly fed into a Linear-Quadratic Regulator (LQR) or a general purpose A* planner. Our experimental results confirm that the perceptor gradients algorithm is able to efficiently learn transferable symbolic representations as well as generate new observations according to a semantically meaningful specification.

Next, I will describe work on learning from demonstration where the task representation is that of hybrid control systems, with emphasis on extracting models that are explicitly verifi able and easily interpreted by robot operators. Through an architecture that goes from the sensorimotor level involving fitting a sequence of controllers using sequential importance sampling under a generative switching proportional controller task model, to higher level modules that are able to induce a program for a visuomotor reaching task involving loops and conditionals from a single demonstration, we show how a robot can learn tasks such as tower building in a manner that is interpretable and eventually verifiable.

References:

1. S.V. Penkov, S. Ramamoorthy, Learning programmatically structured representations with preceptor gradients, In Proc. International Conference on Learning Representations (ICLR), 2019. http://rad.inf.ed.ac.uk/data/publications/2019/penkov2019learning.pdf

2. M. Burke, S.V. Penkov, S. Ramamoorthy, From explanation to synthesis: Compositional program induction for learning from demonstration, https://arxiv.org/abs/1902.10657

Fri, 01 Mar 2019

12:00 - 13:00

Modular, Infinite, and Other Deep Generative Models of Data

Charles Sutton

(University of Edinburgh)

Abstract

Deep generative models provide powerful tools for fitting difficult distributions such as modelling natural images. But many of these methods, including variational autoencoders (VAEs) and generative adversarial networks (GANs), can be notoriously difficult to fit.

One well-known problem is mode collapse, which means that models can learn to characterize only a few modes of the true distribution. To address this, we introduce VEEGAN, which features a reconstructor network, reversing the action of the generator by mapping from data to noise. Our training objective retains the original asymptotic consistency guarantee of GANs, and can be interpreted as a novel autoencoder loss over the noise.

Second, maximum mean discrepancy networks (MMD-nets) avoid some of the pathologies of GANs, but have not been able to match their performance. We present a new method of training MMD-nets, based on mapping the data into a lower dimensional space, in which MMD training can be more effective. We call these networks Ratio-based MMD Nets, and show that somewhat mysteriously, they have dramatically better performance than MMD nets.

A final problem is deciding how many latent components are necessary for a deep generative model to fit a certain data set. We present a nonparametric Bayesian approach to this problem, based on defining a (potentially) infinitely wide deep generative model. Fitting this model is possible by combining variational inference with a Monte Carlo method from statistical physics called Russian roulette sampling. Perhaps surprisingly, we find that this modification helps with the mode collapse problem as well.

Fri, 22 Feb 2019

12:00 - 13:00

The Maximum Mean Discrepancy for Training Generative Adversarial Networks

Arthur Gretton

(UCL Gatsby Computational Neuroscience Unit)

Abstract

Generative adversarial networks (GANs) use neural networks as generative models, creating realistic samples that mimic real-life reference samples (for instance, images of faces, bedrooms, and more). These networks require an adaptive critic function while training, to teach the networks how to move improve their samples to better match the reference data. I will describe a kernel divergence measure, the maximum mean discrepancy, which represents one such critic function. With gradient regularisation, the MMD is used to obtain current state-of-the art performance on challenging image generation tasks, including 160 × 160 CelebA and 64 × 64 ImageNet. In addition to adversarial network training, I'll discuss issues of gradient bias for GANs based on integral probability metrics, and mechanisms for benchmarking GAN performance.

Fri, 15 Feb 2019

12:00 - 13:00

Some optimisation problems in the Data Science Division at the National Physical Laboratory

Stephane Chretien

(National Physical Laboratory)

Abstract

Data science has become a topic of great interest lately and has triggered new widescale research activities around efficientl first order methods for optimisation and Bayesian sampling. The National Physical Laboratory is addressing some of these challenges with particular focus on robustness and confidence in the solution. In this talk, I will present some problems and recent results concerning i. robust learning in the presence of outliers based on the Median of Means (MoM) principle and ii. stability of the solution in super-resolution (joint work with A. Thompson and B. Toader).

Fri, 08 Feb 2019

12:00 - 13:00

Leveraging the Signature for Landmark-based Human Action Recognition

Weixin Yang

(University of Oxford)

Abstract

Landmark-based human action recognition in videos is a challenging task in computer vision. One crucial step is to design discriminative features for spatial structure and temporal dynamics. To this end, we use and refine the path signature as an expressive, robust, nonlinear, and interpretable representation for landmark-based streamed data. Instead of extracting signature features from raw sequences, we propose path disintegrations and transformations as preprocessing to improve the efficiency and effectiveness of signature features. The path disintegrations spatially localize a pose into a collection of m-node paths from which the signatures encode non-local and non-linear geometrical dependencies, while temporally transform the evolutions of spatial features into hierarchical spatio-temporal paths from which the signatures encode long short-term dynamical dependencies. The path transformations allow the signatures to further explore correlations among different informative clues. Finally, all features are concatenated to constitute the input vector of a linear fully-connected network for action recognition. Experimental results on four benchmark datasets demonstrated that the proposed feature sets with only linear network achieves comparable state-of-the-art result to the cutting-edge deep learning methods.

Fri, 25 Jan 2019

12:00 - 13:00

Deep learning on graphs and manifolds: going beyond Euclidean data

Michael Bronstein

(Imperial College London)

Abstract

In the past decade, deep learning methods have achieved unprecedented performance on a broad range of problems in various fields from computer vision to speech recognition. So far research has mainly focused on developing deep learning methods for Euclidean-structured data. However, many important applications have to deal with non-Euclidean structured data, such as graphs and manifolds. Such data are becoming increasingly important in computer graphics and 3D vision, sensor networks, drug design, biomedicine, high energy physics, recommendation systems, and social media analysis. The adoption of deep learning in these fields has been lagging behind until recently, primarily since the non-Euclidean nature of objects dealt with makes the very definition of basic operations used in deep networks rather elusive. In this talk, I will introduce the emerging field of geometric deep learning on graphs and manifolds, overview existing solutions and outline the key difficulties and future research directions. As examples of applications, I will show problems from the domains of computer vision, graphics, high-energy physics, and fake news detection.

Fri, 30 Nov 2018

12:00 - 12:30

I'm not a number: Social data science at the Oxford Internet Institute

Scott Hale

(Oxford Internet Institute)

Abstract

The social sciences are undergoing a profound shift as new data and methods emerge to study human behaviour. These data offer tremendous opportunity but also mathematical and statistical challenges that the field has yet to fully understand. This talk will give an overview of social data science research faculty are undertaking at the Oxford Internet Institute, a multidisciplinary department of the University. Projects include studying the flow of information across languages, the role of political bots, and volatility in public attention.

Fri, 16 Nov 2018

12:00 - 13:00

Topological adventures in neuroscience

Kathryn Hess

(École Polytechnique Fédérale de Lausanne (EPFL))

Abstract

Over the past decade, and particularly over the past five years, research at the interface of topology and neuroscience has grown remarkably fast. In this talk I will briefly survey a few quite different applications of topology to neuroscience in which members of my lab have been involved over the past four years: the algebraic topology of brain structure and function, topological characterization and classification of neuron morphologies, and (if time allows) topological detection of network dynamics.

Fri, 09 Nov 2018

12:30 - 13:00

Using signatures to predict amyotrophic lateral sclerosis progression

Imanol Pérez

(University of Oxford)

Abstract

Medical data often comes in multi-modal, streamed data. The challenge is to extract useful information from this data in an environment where gathering data is expensive. In this talk, I show how signatures can be used to predict the progression of the ALS disease.

Fri, 09 Nov 2018

12:00 - 12:30

Detection of Transient Data using the Signature Features

Hao Ni

(University College London)

Abstract

In this talk, we consider the supervised learning problem where the explanatory variable is a data stream. We provide an approach based on identifying carefully chosen features of the stream which allows linear regression to be used to characterise the functional relationship between explanatory variables and the conditional distribution of the response; the methods used to develop and justify this approach, such as the signature of a stream and the shuffle product of tensors, are standard tools in the theory of rough paths and provide a unified and non-parametric approach with potential significant dimension reduction. We apply it to the example of detecting transient datasets and demonstrate the superior effectiveness of this method benchmarked with supervised learning methods with raw data.

Fri, 26 Oct 2018

12:00 - 13:00

Stochastic perturbations of singular polynomial eigenvalue problems

Martin Lotz

(University of Warwick)

Abstract

One occasionally encounters computational problems that work just fine on ill-posed inputs, even though they should not. One example is polynomial eigenvalue problems, where standard algorithms such as QZ can find a desired solution to instances with infinite condition number to machine precision, while being completely oblivious to the ill-conditioning of the problem. One explanation is that, intuitively, adversarial perturbations are extremely unlikely, and "for all practical purposes'' the problem might not be ill-conditioned at all. We analyse perturbations of singular polynomial eigenvalue problems and derive methods to bound the likelihood of adversarial perturbations for any given input in different stochastic models.

Joint work with Vanni Noferini