More About Us
Data science is an inherently interdisciplinary research are joining more established topics within mathematics and across longstanding departments. Faculty within the data science research group often have research within data science, as well as in other topics. Below is a list of recent, as of 2022, research in data science by our data science research faculty.
Within the context of data science, Baker’s research has two main themes. First, the development of novel methodologies to connect complicated mathematical / computational models with data, and second the development of methods to learn interpretable mathematical models directly from spatio-temporal data. In the context of model calibration, they have developed a new method “multi-level ABC” that enables the calibration of complicated models to data through the use of a hierarchy of approximate models in a likelihood-free Bayesian inference scheme. An important aspect of this work is the use of variance reduction techniques from stochastic analysis to couple model simulations and increase the acceptance rate of parameter samples. In the context of data-driven model construction, they have, for example, developed sparse-regression-based approaches in a Bayesian framework to learn models from noisy spatio-temporal data in a way that allows us to quantify the uncertainty in the learned model and its parameters. Going forward, further development of these types of approaches, coupled with exploration of questions surrounding how to integrate model-predictive control into the frameworks they develop, will be a key focus. In 2018, Baker organised the following conference: Mathematical and Statistical Challenges in Bridging Model Development, Parameter Identification and Model Selection in the Biological Sciences, BIRS.
Efficient Bayesian inference for mechanistic modelling with high-throughput data
S. Martina-Perez, H. Sailem, and R. E. Baker
Bayesian uncertainty quantification for data-driven equation learning
S. Martina-Perez, M. J. Simpson, and R. E. Baker
Proc. Royal Society A, 2021
Multifidelity approximate Bayesian computation with sequential Monte Carlo parameter sampling
T. P. Prescott and R. E. Baker
SIAM/ASA J. on Uncertainty Quantification, 2021
Cartis’ research focus is on optimization algorithm development, analysis and implementation for nonconvex optimization with applications in signal processing, climate modelling and machine learning. In particular randomised and stochastic algorithms have been a recent focus. She is an associate editor for SIAM Journal on the Mathematics of Data Science amongst other journals.
Convergence rate analysis of a stochastic trust-region method via supermartingales
J Blanchet, C Cartis, M Menickelly, K Scheinberg
INFORMS journal on optimization 1 (2), 92-119
Global convergence rate analysis of unconstrained optimization methods based on probabilistic models
C Cartis, K Scheinberg
Mathematical Programming 169 (2), 337-375, 2018
Global rates of convergence for nonconvex optimization on manifolds
N Boumal, PA Absil, C Cartis
IMA Journal of Numerical Analysis 39 (1), 1-33, 2019
Cohen’s research looks at problems at the interface of decision making, statistics, data science and finance and economics. In particular he is interested in the underlying mathematics behind the use of probabilistic and statistical models in decision making, the design and properties of these models, and the numerical challenges involved in inference. PI for the Turing-ONS partnership, PI for the Turing-AFM (dutch financial regulator) partnership, and co-theme lead for Machine Learning in Finance at Turing. He is also involved in various other projects around Turing (e.g. Decovid). He is the Program Director for the SIAM activity group on Financial Mathematics and Engineering (and so will be co-organizing the next SIAM FME conference).
Gittins’ theorem under uncertainty
S. N. Cohen and T. Treetanthiploet
Electronic Journal of Probability 27 1-48.
Arbitrage-free neural-SDE market models
S. N. Cohen, C. Reisinger, and S. Wang
Current research directions and/or contributions are in the Mathematical foundations of deep learning and applications of machine learning in finance. In particular, one recent contribution has been to uncover novel asymptotic properties of deep ResNets and link them to a class of forward-backward stochastic differential equations, showing in particular that 'neural ODEs' (Chen et al 2018) are not the only possible scaling limits of ResNets. This opens the way to the use of stochastic control methods for training of ResNets, which they are currently exploring with R Xu (USC). Cont serves as Scientific Advisor to several UK and EU tech start-ups focused on applications of Data Science (ML, Deep Learning) to industrial problems: InstaDeep www.instadeep.com, Mosaic SmartData https://mosaicsmartdata.com/, and 73Strings https://www.73strings.com/
Universal Features of Price Formation in Financial Markets: Perspectives From Deep Learning,
J Sirignano and R Cont
Quantitative Finance Vol. 19, No. 9, 1449-1459, 2019.
Scaling properties of deep residual networks,
A Cohen, R Cont, A Rossier, Renyuan Xu (2021)
International Conference on Machine Learning (ICML 2021).
TailGAN : Nonparametric scenario generation for tail risk estimation.,
R Cont, Mihai Cucuringu, Renyuan Xu, Chao Zhang (2021)
Farmer’s work focuses on how agent-based models can be calibrated to data so that they can be used to make good time series predictions and reliable policy analysis. For example, we recently built a model for the effect of the COVID pandemic on the UK economy that produced very accurate forecasts in real time. Much of my work involves connections to network science. Applications include patents, economic forecasting and technological change. Farmer is on the advisory board of the data science division of IHS Markit (one of the leading data providers). Farmer will be presenting to the UK Prime Minister’s top economic advisors at Number 10 Downing St. in early May 2022.
Empirically grounded energy forecasts and the energy transition
R. Way, M. Ives, P. Mealy and J.D. Farmer,
Technological interdependencies predict innovation dynamics
Pichler, A., Lafond, F. and Farmer J.D.
Occupational mobility and automation: A data-driven network model
del Rio Chanona, R. M., P. Mealy, M. Beguerisse-Diaz, F. Lafond and J.D. Farmer,
Journal of the Royal Society Interface
Goriely’s work in data sciences is on multiple fronts: development of Bayesian methods for model inference (with FMRIB and Roche), development of TDA methods for tree structure with application to neuronal networks (with Heather, submitted), systematic probabilistic connectome constructions from MRI data (with FMRIB), analysis of large patient database for neurodegenerative diseases (with SIMULA and BOFINDER) and dynamics on networks (here). The general description would be development and application of methods for data mining and model validation in neuroscience. Accordingly, Goriely works with UK-BIOFINDER, ADNI, and BIOFINDER, as their main sources of data. Not surprisingly, data science methods are playing a central role in this field. As Editor-in-chief of the Journal of Nonlinear Science and of Brain Multiphysics, Goriely receives and processes a fair amount of papers in the area of data sciences and machine learning, typically with applications in the area of either dynamical system or neuroimaging.
Braiding Braak and Braak: Staging patterns and model selection in network neurodegeneration
P. Putra, T. B. Thompson, P. Chaggar, and A. Goriely
Network Neuroscience, 5 (4) 929-956 (2021)
Global and local mobility as a barometer for COVID-19 dynamics
K. Linka, A. Goriely, and E. Kuhl
Biomechanics and Modeling in Mechanobiology
An autonomous oscillation timing and executes centriole biogenesis
M. G. Ayydogan and 13 additional authors
Cell 181(7) 2020.
Grindrod’s research focus is on the theory of dynamically evolving networks, including fully coupled through time-dependent network dynamics and scaling: applications of mathematics to social media, digital media and marketing, and the digital economy. Mathematical modelling of human consciousness and neuromophic information processing. Modelling for counter terrorism and online threats and harms. Grindrod has worked with Jaywing plc, dstl/MOD, The AI Council, ICO, Emirates. He is Chairman of two data science start-up companies: Hare Analytics Ltd www.hareanalytics.com and GTT Analytics www.gttanalytics.com . He is also Founding Trustee of the Alan Turing Institute.
Cortex-Like Complex Systems: What Occurs Within?
P. Grindrod and C. Lester.
Front. Appl. Math. Stat., 24 September 2021
P. Grindrod (2021)
The Activity of the Far Right on Telegram v2.11
A.Bovet and P. Grindrod (2020)
Heather A. Harrington
Harrington is particularly interested in developing methods for integrating data types in biology (e.g., spatial and omics) as well as quantifying and predicting multiscale biological systems. Her research relies on comparing mechanistic models and data by developing approaches that rely on topology, algebra, statistics, optimization and networks. Harrington is a co-director of the Centre for TDA, which has approximately 50 members. She is an editor on AIMS Foundations of Data Science. She is also a book series editor of Spring Mathematics of Data. How to find structure in data? Harrington is a member of the Turing-Roche Expert Advisory Panel as well as on advisory boards of "Algebra, Topology, Geometry in Life Sciences" and "CHIMERA EPSRC Healthcare Hub. She is giving talks at a couple machine learning conferences like: 2nd Workshop on Geometrical & Topological Representation Learning @ ICLR 2022 and London Geometry and Machine Learning ('LOGML') and participate at Geometry, Topology and Statistics in Data Sciences at IHP.
A blood atlas of COVID-19 defines hallmarks of disease severity and specificity.
COMBAT Consortium (203 authors).
To appear in Cell. Available at medRxiv.
Principal Components along Quiver Representations.
Seigal A, Harrington HA, Nanda V.
To appear in Found Comut Math. Available at arXiv:2104.10666.
Multi-parameter persistent homology landscapes identify immune cell spatial patterns in tumors. Vipond O, Bull JA, Macklin PS, Tillmann U, Pugh CW, Byrne HM, Harrington HA. (2021) Proc Nat Acad Sci. 118 (41) e210216611.
Lambiotte’s research focuses on large networks. He is interested in developing novel algorithms to extract useful information from the myriad of connections forming a network, primarily via community detection (clustering of the nodes) and the characterisation of temporal networks. His research is rooted in the analysis of empirical data with collaborations in neuroimaging, in social networks, human behaviour and in retail. Lambiotte is a Turing Fellow. He has organised Netmob - the main conference on the scientific analysis of mobile phone datasets in 2019, https://netmob.org/ . He will co-organise the SIAM Workshop on Network Science in September 2022. He will also be co-organising the Graph Learning workshop @ TheWebConf 2022. Lambiotte is in the program committee of conferences like ICWSM and WSDM. Lambiotte is scientific advisor in the start-up https://www.pometry.com/, aiming at developing distributed algorithms for the analysis of large graphs.
DEBAGREEMENT: A comment-reply dataset for (dis) agreement detection in online debates
Pougué-Biyong, John, et al.
Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2021.
Community detection in networks without observing edges.
Hoffmann, Till, et al.
Science advances 6.4 (2020): eaav1478.
Variance and covariance of distributions on graphs.
Devriendt, Karel, Samuel Martin-Gutierrez, and Renaud Lambiotte.
SIAM Review, in press (2022)
Nanda’s work is primarily within applied and computational algebraic topology, which subsumes the field of topological data analysis. Lately his interests have focused on the interaction between data science and singularity theory. Two key challenges in this endeavour are: (a) data stratification: the detection of singularities in large datasets, and (b) singular optimization: algorithms for gradient descent over singular spaces. Until recently, he was a Turing fellow. As such, Nanda (a) organised the Theory and Algorithms in Data Science seminar (joint with Mihai Cucuringu), and (b) served as coordinator of the Topology & Geometry of data research group. Before starting his paternity leave, he also organised the Data Science seminar at the Maths Institute for two years (first by himself, and then jointly with Anna Seigal).
B. J. Stolz, J. Tanner, H. Harrington, and V. Nanda
Persistence paths and signature features in topological data analysis
Chevyrev, V. Nanda, and H. Oberhauser
IEEE Trans. On Pattern Analysis and machine intelligence
Dist2Cycle: A simplicial neural network for homology localization
D. Keros, V. Nanda, and K. Subr
Oberhauser is interested in the use of stochastic processes in data science. In particular, he works on connecting ideas from rough paths to kernel learning leading to signature kernels, efficient ways to describe high dimensional probability measures with so-called recombination methods, and approaches to topological data analysis coming from stochastic analysis. Oberhauser is a Co-IP in Terry Lyons' Datasig grant, a member of the CIMDA Oxford-Hong Kong initiative, a visiting researcher at ATI, associated editor at SIAM Journal on Mathematical Finance; he also works with GCHQ on MCMC methods, and he is organizing workshops/conferences; the next one in BIRS in September called "New Interfaces of Stochastic Analysis".
Neural SDEs as Infinite-Dimensional GANs
Patrick Kidger, James Foster, Xuechen Li, Terry Lyons.
Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections
Csaba Toth, Patric Bonnier.
A Randomized Algorithm to Reduce the Support of Discrete Measures
Francesco Cosentino, Alessandro Abate.
NeuRIPs 2020 (spotlight paper)
Lyons is currently PI of the DataSıg program (primarily funded by EPSRC), and of the complementary research programme CIMDA-Oxford. The focus of the DataSig group is on the mathematics of multidimensional data that evolves. Lyons’s long-term research interests are all focused on Rough Paths, Stochastic Analysis, and applications. Lyons is on the Advisory Committee of new Microsoft Research Asia Theory Center (https://www.datasig.ac.uk/article/msra). Lyons recently
orgnized a meeting at ICERM. They also had an online meeting at Oberwolfach, two meetings at Newton for industry collaborators and two workshops at the RSS.
The Signature Kernel is the solution of a Goursat PDE
Cristopher Salvi, Thomas Cass, James Foster, Terry Lyons and Weixin Yang
SIAM Journal on Mathematics of Data Science, vol. 3, no. 3, pp. 873–99 (9 Sep 2021)
Neural Rough Differential Equations for Long Time Series
James Morrill, Cristopher Salvi, Patrick Kidger, James Foster and Terry Lyons
Proceedings of the 38th International Conference on Machine Learning (PMLR) 2021, vol. 139, pp. 7829–38 (1 July 2021)
Tanner’s research focus is in the design, analysis, and application of algorithms for information inspired problems. His main contributions have been in: low complex models such as compressed sensing, low rank matrix completion, sparse measures, and mixed models, such methods applied to improve medical MRI, and more recently developing theory for deep learning. He is founding Editor-in-Chief of Information and Inference: A Journal of the IMA, published by Oxford University Press. He is a member of the editorial boards of Applied and Computational Harmonic Analysis, and SIAM Multiscale Modelling and Simulation. Tanner was Oxford University Turing Lead from 2016 to 2020.
Activation function design for deep networks: linearity and effective initialisation
M. Murray, V. Abrol, and J. Tanner
Applied and Computational Harmonic Analysis, accepted (2021).
Matrix rigidity and the ill-posedness of Robust PCA and matrix completion
J. Tanner, A. Thompson, and S. Vary
SIAM Mathematics of Data Science, Vol. 1(3) (2019) 537-554.
Dense for the price of sparse: improved performance of sparsely initialized networks via a subspace offset
I. Price and J. Tanner
International Conference on Machine Learning (ICML), July 2021.
Most of Tillmann’s effort in the area of Topological Data Science (TDA) has been in collaboration with students and members of our new EPSRC funded, Oxford based Centre for TDA. An important overall theme for the centre is that the work is driven by applications via a two way exchange: applications require new computational tools and stimulate new theoretical investigations, and vice versa new theoretical developments and algorithms are implemented and tested on concrete data science problems. The scope of the work is deliberately broad, thus allowing individual students and post-docs to concentrate on different topics while insuring impact. Highlights include a comprehensive survey of computational tool for TDA [1.], a thorough theoretical study of the differentiability of the persistence map (PH) [2.] with a view to combine machine learning with TDA, contributing to our understanding of random topology [3.] informing the null hypothesis in TDA, a thorough study of the fibre of the PH map [4.] to understand information loss in TDA, and finally
use the tools in relevant and meaningful applications [5.]. Tillmann sits on Cantab Scientific Advisory Board. She was Chair for Alan Turing Institute Programme Committee (Jan 2016 –Jun 2017). Tillmann has organised the following conference/events: Turing Scoping meeting, Oxford 2015, LMS-Clay Summer School, Oxford 2015, Spires, TDA Centre Oxford 2019 and ATMCS 10, Oxford 2022.
A roadmap for the computation of persistent homology,
N. Otter, M. A. Porter, U. Tillmann, P. Grindrod, and H. A. Harrington
EPJ Data Science, 2017
A framework for differential calculus on persistence barcode
J. Leygonie, S. Oudot, and U. Tillmann
Foundations of Computational Mathematics, 2021
Multiparameter persistent homology landscapes identify immune cell spatial patterns in tumors
O. Vipond, J. A. Bull, P. S. Macklin, U. Tillmann, C. W. Pugh, H. M. Byrne, and H. A. Harrington
Proceedings of the National Academy of Sciences,