More About Us

Data science is an inherently interdisciplinary research are joining more established topics within mathematics and across longstanding departments. Faculty within the data science research group often have research within data science, as well as in other topics. Below is a list of recent, as of 2022, research in data science by our data science research faculty.

Prof. Ruth Baker

Within the context of data science, Baker’s research has two main themes. First, the development of novel methodologies to connect complicated mathematical / computational models with data, and second the development of methods to learn interpretable mathematical models directly from spatio-temporal data. In the context of model calibration, they have developed a new method “multi-level ABC” that enables the calibration of complicated models to data through the use of a hierarchy of approximate models in a likelihood-free Bayesian inference scheme. An important aspect of this work is the use of variance reduction techniques from stochastic analysis to couple model simulations and increase the acceptance rate of parameter samples. In the context of data-driven model construction, they have, for example, developed sparse-regression-based approaches in a Bayesian framework to learn models from noisy spatio-temporal data in a way that allows us to quantify the uncertainty in the learned model and its parameters. Going forward, further development of these types of approaches, coupled with exploration of questions surrounding how to integrate model-predictive control into the frameworks they develop, will be a key focus. In 2018, Baker organised the following conference: Mathematical and Statistical Challenges in Bridging Model Development, Parameter Identification and Model Selection in the Biological Sciences, BIRS.

Efficient Bayesian inference for mechanistic modelling with high-throughput data

S. Martina-Perez, H. Sailem, and R. E. Baker

2022

Bayesian uncertainty quantification for data-driven equation learning

S. Martina-Perez, M. J. Simpson, and R. E. Baker

Proc. Royal Society A, 2021

Multifidelity approximate Bayesian computation with sequential Monte Carlo parameter sampling

T. P. Prescott and R. E. Baker

SIAM/ASA J. on Uncertainty Quantification, 2021

Prof. Coralia Cartis

Cartis’ research focus is on optimization algorithm development, analysis and implementation for nonconvex optimization with applications in signal processing, climate modelling and machine learning. In particular randomised and stochastic algorithms have been a recent focus. She is an associate editor for SIAM Journal on the Mathematics of Data Science amongst other journals.

Convergence rate analysis of a stochastic trust-region method via supermartingales

J Blanchet, C Cartis, M Menickelly, K Scheinberg

INFORMS journal on optimization 1 (2), 92-119, 2019

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

C Cartis, K Scheinberg

Mathematical Programming 169 (2), 337-375, 2018

Global rates of convergence for nonconvex optimization on manifolds

N Boumal, PA Absil, C Cartis

IMA Journal of Numerical Analysis 39 (1), 1-33, 2019

Prof. Samuel Cohen

Cohen’s research looks at problems at the interface of decision making, statistics, data science and finance and economics. In particular he is interested in the underlying mathematics behind the use of probabilistic and statistical models in decision making, the design and properties of these models, and the numerical challenges involved in inference. PI for the Turing-ONS partnership, PI for the Turing-AFM (dutch financial regulator) partnership, and co-theme lead for Machine Learning in Finance at Turing. He is also involved in various other projects around Turing (e.g. Decovid). He is the Program Director for the SIAM activity group on Financial Mathematics and Engineering (and so will be co-organizing the next SIAM FME conference).

Gittins’ theorem under uncertainty

S. N. Cohen and T. Treetanthiploet

Electronic Journal of Probability 27 1-48, 2022

Arbitrage-free neural-SDE market models

S. N. Cohen, C. Reisinger, and S. Wang

2021

Prof. Rama Cont

Current research directions and/or contributions are in the Mathematical foundations of deep learning and applications of machine learning in finance. In particular, one recent contribution has been to uncover novel asymptotic properties of deep ResNets and link them to a class of forward-backward stochastic differential equations, showing in particular that 'neural ODEs' (Chen et al 2018) are not the only possible scaling limits of ResNets. This opens the way to the use of stochastic control methods for training of ResNets, which they are currently exploring with R Xu (USC). Cont serves as Scientific Advisor to several UK and EU tech start-ups focused on applications of Data Science (ML, Deep Learning) to industrial problems: InstaDeep www.instadeep.com, Mosaic SmartData https://mosaicsmartdata.com/, and 73Strings https://www.73strings.com/

Universal Features of Price Formation in Financial Markets: Perspectives From Deep Learning,

J Sirignano and R Cont

2019

Quantitative Finance

Vol. 19, No. 9, 1449-1459, 2019

Scaling properties of deep residual networks,

A Cohen, R Cont, A Rossier, Renyuan Xu

International Conference on Machine Learning, ICML 2021

TailGAN : Nonparametric scenario generation for tail risk estimation.,

R Cont, Mihai Cucuringu, Renyuan Xu, Chao Zhang

Working Paper, 2021

Prof. Radek Erban

Erban has worked on the development, analysis and application of mathematical and computational techniques for a broad range of biological, chemical and physical problems, ranging in size from molecular dynamics simulations of small molecules to modelling groups of animals and robots. Within the context of data science, Erban is happy to supervise research projects analysing the time series data calculated by all-atom and coarse-grained molecular dynamics simulations of nucleic acids and proteins. The state-of-the-art molecular dynamics simulations generate tens of gigabytes of data and we work on developing and analysing techniques for dimensionality reduction, regression and classification of such data. This is a fast evolving research area using a number of data science and machine learning methodologies including diffusion maps, neural networks and physics-informed machine learning, with applications to drug discovery and protein design.

Asymmetric Periodic Boundary Conditions for All-Atom Molecular Dynamics and Coarse-Grained Simulations of Nucleic Acids

R. Erban and Y. Togashi, J.

Phys. Chem. B 2023, 127, 38, 8257–8267, 2023

Coarse-graining Molecular Dynamics: Stochastic Models with non-Gaussian Force Distributions

R. Erban

Journal of Mathematical Biology 80, 457-459, 2020

ADM-CLE Approach for Detecting Slow Variables in Continuous Time Markov Chains and Dynamic Data

M.Cucuringu and R. Erban

SIAM J. on Scientific Computing 39, 1, B76-B101, 2017

Prof. Doyne Farmer

Farmer’s work focuses on how agent-based models can be calibrated to data so that they can be used to make good time series predictions and reliable policy analysis. For example, we recently built a model for the effect of the COVID pandemic on the UK economy that produced very accurate forecasts in real time. Much of my work involves connections to network science. Applications include patents, economic forecasting and technological change. Farmer is on the advisory board of the data science division of IHS Markit (one of the leading data providers). Farmer will be presenting to the UK Prime Minister’s top economic advisors at Number 10 Downing St. in early May 2022.

Empirically grounded energy forecasts and the energy transition

R. Way, M. Ives, P. Mealy and J.D. Farmer

2022

Technological interdependencies predict innovation dynamics

Pichler, A., Lafond, F. and Farmer J.D.

2020

Occupational mobility and automation: A data-driven network model

del Rio Chanona, R. M., P. Mealy, M. Beguerisse-Diaz, F. Lafond and J.D. Farmer

Journal of the Royal Society Interface, 2021

Prof. Alan Goriely

Goriely’s work in data sciences is on multiple fronts: development of Bayesian methods for model inference (with FMRIB and Roche), development of TDA methods for tree structure with application to neuronal networks (with Heather, submitted), systematic probabilistic connectome constructions from MRI data (with FMRIB), analysis of large patient database for neurodegenerative diseases (with SIMULA and BOFINDER) and dynamics on networks (here). The general description would be development and application of methods for data mining and model validation in neuroscience. Accordingly, Goriely works with UK-BIOFINDER, ADNI, and BIOFINDER, as their main sources of data. Not surprisingly, data science methods are playing a central role in this field. As Editor-in-chief of the Journal of Nonlinear Science and of Brain Multiphysics, Goriely receives and processes a fair amount of papers in the area of data sciences and machine learning, typically with applications in the area of either dynamical system or neuroimaging.

Braiding Braak and Braak: Staging patterns and model selection in network neurodegeneration

P. Putra, T. B. Thompson, P. Chaggar, and A. Goriely

Network Neuroscience, 5 (4) 929-956, 2021

Global and local mobility as a barometer for COVID-19 dynamics

K. Linka, A. Goriely, and E. Kuhl

Biomechanics and Modeling in Mechanobiology, 2021

An autonomous oscillation timing and executes centriole biogenesis

M. G. Ayydogan and 13 additional authors

Cell 181(7), 2020

Prof. Peter Grindrod

Grindrod’s research focus is on the theory of dynamically evolving networks, including fully coupled through time-dependent network dynamics and scaling: applications of mathematics to social media, digital media and marketing, and the digital economy. Mathematical modelling of human consciousness and neuromophic information processing. Modelling for counter terrorism and online threats and harms. Grindrod has worked with Jaywing plc, dstl/MOD, The AI Council, ICO, Emirates. He is Chairman of two data science start-up companies: Hare Analytics Ltd www.hareanalytics.com and GTT Analytics www.gttanalytics.com . He is also Founding Trustee of the Alan Turing Institute.

Cortex-Like Complex Systems: What Occurs Within?

P. Grindrod and C. Lester.

Front. Appl. Math. Stat., 24 September 2021

Unconventional AI

P. Grindrod

2021

The Activity of the Far Right on Telegram v2.11

A.Bovet and P. Grindrod

2020

Prof. Heather A. Harrington

Harrington is particularly interested in developing methods for integrating data types in biology (e.g., spatial and omics) as well as quantifying and predicting multiscale biological systems. Her research relies on comparing mechanistic models and data by developing approaches that rely on topology, algebra, statistics, optimization and networks. Harrington is a co-director of the Centre for TDA, which has approximately 50 members. She is an editor on AIMS Foundations of Data Science. She is also a book series editor of Spring Mathematics of Data. How to find structure in data? Harrington is a member of the Turing-Roche Expert Advisory Panel as well as on advisory boards of "Algebra, Topology, Geometry in Life Sciences" and "CHIMERA EPSRC Healthcare Hub. She is giving talks at a couple machine learning conferences like: 2nd Workshop on Geometrical & Topological Representation Learning @ ICLR 2022 and London Geometry and Machine Learning ('LOGML') and participate at Geometry, Topology and Statistics in Data Sciences at IHP.

A blood atlas of COVID-19 defines hallmarks of disease severity and specificity

COMBAT Consortium (203 authors)

To appear in Cell. Available at medRxiv, 2022

Principal Components along Quiver Representations

Seigal A, Harrington HA, Nanda V.

2021

To appear in Found Comut Math. Available at arXiv:2104.10666.

Multi-parameter persistent homology landscapes identify immune cell spatial patterns in tumors

Vipond O, Bull JA, Macklin PS, Tillmann U, Pugh CW, Byrne HM, Harrington HA

Proc Nat Acad Sci. 118 (41) e210216611, 2021

Prof. Renaud Lambiotte

Lambiotte’s research focuses on large networks. He is interested in developing novel algorithms to extract useful information from the myriad of connections forming a network, primarily via community detection (clustering of the nodes) and the characterisation of temporal networks. His research is rooted in the analysis of empirical data with collaborations in neuroimaging, in social networks, human behaviour and in retail. Lambiotte is a Turing Fellow. He has organised Netmob - the main conference on the scientific analysis of mobile phone datasets in 2019, https://netmob.org/. He will co-organise the SIAM Workshop on Network Science in September 2022. He will also be co-organising the Graph Learning workshop @ TheWebConf 2022. Lambiotte is in the program committee of conferences like ICWSM and WSDM. Lambiotte is scientific advisor in the start-up https://www.pometry.com/, aiming at developing distributed algorithms for the analysis of large graphs.

DEBAGREEMENT: A comment-reply dataset for (dis) agreement detection in online debates

Pougué-Biyong, John, et al.

Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

Community detection in networks without observing edges.

Hoffmann, Till, et al.

Science Advances 6.4, 2021: eaav1478

Variance and covariance of distributions on graphs.

Devriendt, Karel, Samuel Martin-Gutierrez, and Renaud Lambiotte

SIAM Review, in press, 2022

Prof. Terry Lyons

Lyons is currently PI of the DataSıg program (primarily funded by EPSRC), and of the complementary research programme CIMDA-Oxford. The focus of the DataSig group is on the mathematics of multidimensional data that evolves. Lyons’s long-term research interests are all focused on Rough Paths, Stochastic Analysis, and applications. Lyons is on the Advisory Committee of new Microsoft Research Asia Theory Center (https://www.datasig.ac.uk/article/msra). Lyons recently organized a meeting at ICERM. They also had an online meeting at Oberwolfach, two meetings at Newton for industry collaborators and two workshops at the RSS.

The Signature Kernel is the solution of a Goursat PDE

Cristopher Salvi, Thomas Cass, James Foster, Terry Lyons and Weixin Yang

SIAM Journal on Mathematics of Data Science, vol. 3, no. 3, pp. 873–99, 9 Sep 2021

Neural Rough Differential Equations for Long Time Series

James Morrill, Cristopher Salvi, Patrick Kidger, James Foster and Terry Lyons

Proceedings of the 38th International Conference on Machine Learning (PMLR) 2021, vol. 139, pp. 7829–38, 1 July 2021

Prof. Andreas Muench

Most materials we encounter in applications and in everyday life are made up of different components and often have a complex micro- and nanostructure. Their overall properties are the result of the interplay between the constituents and the multi-scale structure. The goal of Muench's research is to understand and predict, through mathematical modelling, analysis and numerical simulation, the behaviour of complex materials that underlies physical phenomena and technological applications. I greatly enjoy developing mathematical methods to construct a theory of the essential features, reduce complexity and at the same time identify subtle analytical properties. His current emphasis lies on investigating phenomena on liquid liquid phase separation and wetting, as well as the statics and dynamics of hydrogels (poroelastic materials) in the cell biology and regenerative medicine.

Counterion-controlled phase equilibria in a charge-regulated polymer solution

Celora, Giulia L. and Blossey, Ralf and Muench, Andreas and Wagner, Barbara

The Journal of Chemical Physics, 2023

The Dynamics of a Collapsing Polyelectrolyte Gel

Celora, Giulia L. and Hennessy, Matthew G. and Muench, Andreas and Wagner, Barbara and Waters, Sarah

SIAM Journal on Applied Mathematics, 2023

A Kinetic Model of a Polyelectrolyte Gel Undergoing Phase Separation

Celora, Giulia L. and Hennessy, Matthew G. and Muench, Andreas and Wagner, Barbara and Waters, Sarah L.

Journal of the Mechanics and Physics of Solids, 2022

Prof Yuji Nakasukasa

Nakatsukasa's work is primarily in large-scale numerical linear algebra problems. Many (arguably most) problems in scientific computing and data science boil down to a linear algebra problem, and the goal is to solve them efficiently and reliably.
Nakatsukasa's current focus is on randomised algorithms that are fast, reliable, and accurate. These are often the only class of algorithms that can tackle problems of today's scale. Specific problems of interest include low-rank approximation (SVD/PCA), linear systems and least-squares problems, eigenvalue problems, and matrix approximation. Nakatsukasa is also interested in approximation theory, in particular rational approximation.

Nakatsukasa co-organises the Computational Mathematics and Applications Seminar, and together with Patrick Farrell and Alex Townsend, he organised the conference https://21stcenturyna.github.io/

Fast & Accurate Randomized Algorithms for Linear Systems and Eigenvalue Problems

Y. Nakatsukasa, J. A. Tropp

2021

Fast and stable randomized low-rank matrix approximation

Y. Nakatsukasa

2020

The AAA algorithm for rational approximation

Y. Nakatsukasa, O. Sete, L. N. Trefethen

SIAM Journal on Scientific Computing, 2018

Prof. Vidit Nanda

Nanda’s work is primarily within applied and computational algebraic topology, which subsumes the field of topological data analysis. Lately his interests have focused on the interaction between data science and singularity theory. Two key challenges in this endeavour are: (a) data stratification: the detection of singularities in large datasets, and (b) singular optimization: algorithms for gradient descent over singular spaces. Until recently, he was a Turing fellow. As such, Nanda (a) organised the Theory and Algorithms in Data Science seminar (joint with Mihai Cucuringu), and (b) served as coordinator of the Topology & Geometry of data research group. Before starting his paternity leave, he also organised the Data Science seminar at the Maths Institute for two years (first by himself, and then jointly with Anna Seigal).

Geometric anomaly detection

B. J. Stolz, J. Tanner, H. Harrington, and V. Nanda

PNAS 2020

Persistence paths and signature features in topological data analysis

Chevyrev, V. Nanda, and H. Oberhauser

IEEE Trans. On Pattern Analysis and machine intelligence, 2018

Dist2Cycle: A simplicial neural network for homology localization

D. Keros, V. Nanda, and K. Subr

AAAI 2022

P rof. Harald Oberhauser

Oberhauser is interested in the use of stochastic processes in data science. In particular, he works on connecting ideas from rough paths to kernel learning leading to signature kernels, efficient ways to describe high dimensional probability measures with so-called recombination methods, and approaches to topological data analysis coming from stochastic analysis. Oberhauser is a Co-IP in Terry Lyons' Datasig grant, a member of the CIMDA Oxford-Hong Kong initiative, a visiting researcher at ATI, associated editor at SIAM Journal on Mathematical Finance; he also works with GCHQ on MCMC methods, and he is organizing workshops/conferences; the next one in BIRS in September called "New Interfaces of Stochastic Analysis".

Neural SDEs as Infinite-Dimensional GANs

Patrick Kidger, James Foster, Xuechen Li, Terry Lyons.

ICML 2021

Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections

Csaba Toth, Patric Bonnier.

ICLR 2021

A Randomized Algorithm to Reduce the Support of Discrete Measures

Francesco Cosentino, Alessandro Abate.

NeuRIPs 2020 (spotlight paper)

Prof. Christoph Reisinger

Reisinger’s research in the broader scope of data science has various different strands: (i) the analytical and empirical study of the expression rate of deep neural networks for high-dimensional value functions of control problems; (ii) the design and analysis of efficient policy gradient methods in control and reinforcement learning; (iii) optimal information gathering under costly and delayed information streams, for control of single and many-agent problems, in particular in mean-field approximations. Reisinger has widespread collaborations with industry, including applications in risk management and trade execution. He is Editor-in-Chief of The Journal of Computational Finance, where he has overseen a transition from model-based computations to data-driven approaches, he has been on the organising committee of the most recent instalments of the International Conference on AI in Finance, and has ample experience consulting and teaching short courses in this area.

Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff system

C. Reisinger, Y. Zhang

Analysis and Applications, 18(6), 951–⁠999, 2020

Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems

C. Reisinger, W. Stockinger, Y. Zhang

SIAM Journal on Control and Optimization, forthcoming, 2022

Mean-field games of speedy information access with observation costs

D. Becherer, C. Reisinger, J. Tam

2023

Prof. Justin Sirignano

Justin is an Associate Professor of Mathematics working on interdisciplinary research in Applied Mathematics, Machine Learning, and High-Performance Computing. His recent research has focused on the mathematical theory and applications of Deep Learning. He has contributed to the mathematical theory of deep learning in the areas of mean-field limits for neural networks, including asymptotic analysis of multi-layer neural networks, reinforcement learning, and recurrent neural networks. Another area of interest has been the development and rigorous mathematical analysis of machine learning methods for stochastic differential equations and partial differential equations. Examples of his recent publications include:

Deep Learning Closure Models for Large-Eddy Simulation of Flows around Bluff Bodies

J. Sirignano with J. MacArt

Accepted at the Journal of Fluid Mechanics, 2023

Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences

J. Sirignano with S. Lam and K. Spiliopoulos

2023

Global Convergence of Deep Galerkin and PINNs Methods for Solving PDEs

J. Sirignano with D. Jiang and S. Cohen

2023

Prof. Jared Tanner

Tanner’s research focus is in the design, analysis, and application of algorithms for information inspired problems. His main contributions have been in: low complex models such as compressed sensing, low rank matrix completion, sparse measures, and mixed models, such methods applied to improve medical MRI, and more recently developing theory for deep learning. He is founding Editor-in-Chief of Information and Inference: A Journal of the IMA, published by Oxford University Press. He is a member of the editorial boards of Applied and Computational Harmonic Analysis, and SIAM Multiscale Modelling and Simulation. Tanner was Oxford University Turing Lead from 2016 to 2020.

Activation function design for deep networks: linearity and effective initialisation

M. Murray, V. Abrol, and J. Tanner

Applied and Computational Harmonic Analysis, accepted, 2021

Matrix rigidity and the ill-posedness of Robust PCA and matrix completion

J. Tanner, A. Thompson, and S. Vary

SIAM Mathematics of Data Science, Vol. 1(3), 2019, 537-554

Dense for the price of sparse: improved performance of sparsely initialized networks via a subspace offset

I. Price and J. Tanner

International Conference on Machine Learning (ICML), July 2021

Prof. Ulrike Tillmann

Tillmann's area of expertise in Topological Data Science (TDA) and she is a co-director of the EPSRC funded, Oxford based Centre for TDA. An important overall theme for the centre is that the work is driven by applications via a two way exchange: applications require new computational tools and stimulate new theoretical investigations, and vice versa new theoretical developments and algorithms are implemented and tested on concrete data science problems.

The scope of the work is deliberately broad, thus allowing individual students and post-docs to concentrate on different topics while insuring impact. Highlights include a comprehensive survey of computational tool for TDA, a thorough theoretical study of the differentiability of the persistence map (PH) with a view to combine machine learning with TDA, contributing to our understanding of random topology informing the null hypothesis in TDA, a thorough study of the fibre of the PH map to understand information loss in TDA, development of new invariants for multi parameter persistent homology which is one of the central theoretical challenges TDA, and finally using the tools in relevant and meaningful applications.

Tillmann serves as the inaugural Chair for Alan Turing Institute's Programme Committee (Jan 2016 –Jun 2017). She has organised a number of conference and events in the field: Turing Scoping meeting, Oxford 2015; LMS-Clay Summer School, Oxford 2015; Spires, TDA Centre Conference Oxford 2019; and ATMCS 10, Oxford 2022.

A framework for differential calculus on persistence barcode

J. Leygonie, S. Oudot, and U. Tillmann

Foundations of Computational Mathematics, 2021

Multiparameter persistent homology landscapes identify immune cell spatial patterns in tumors

O. Vipond, J. A. Bull, P. S. Macklin, U. Tillmann, C. W. Pugh, H. M. Byrne, and H. A. Harrington

Proceedings of the National Academy of Sciences, 2021

The space of barcode bases for persistence modules

E. Jacquard, V. Nanda, and U. Tillmann

Journal of Applied and Computational Topology, 2023