This talk has been cancelled.

# Past Data Science Seminar

We show that two rules are sufficient to automate gradient descent: 1) don't increase the stepsize too fast and 2) don't overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on smoothness in a neighborhood of a solution. Given that the problem is convex, our method will converge even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including matrix factorization and training of ResNet-18.

I will present numerical methods for low-rank matrix and tensor problems that explicitly make use of the geometry of rank constrained matrix and tensor spaces. We focus on two types of problems: The first are optimization problems, like matrix and tensor completion, solving linear systems and eigenvalue problems. Such problems can be solved by numerical optimization for manifolds, called Riemannian optimization methods. We will explain the basic elements of differential geometry in order to apply such methods efficiently to rank constrained matrix and tensor spaces. The second type of problem is ordinary differential equations, defined on matrix and tensor spaces. We show how their solution can be approximated by the dynamical low-rank principle, and discuss several numerical integrators that rely in an essential way on geometric properties that are characteristic to sets of low rank matrices and tensors. Based on joint work with André Uschmajew (MPI MiS Leipzig).

In this talk we present p-order methods for unconstrained minimization of convex functions that are p-times differentiable with Hölder continuous p-th derivatives. We establish worst-case complexity bounds for methods with and without acceleration. Some of these methods are "universal", that is, they do not require prior knowledge of the constants that define the smoothness level of the objective function. A lower complexity bound for this problem class is also obtained. This is a joint work with Yurii Nesterov (Université Catholique de Louvain).

In this talk I will analyse ERK time course data by developing mathematical models of enzyme kinetics. I will present how we can use differential algebra and geometry for model identifiability, and topological data analysis to study these the dynamics of ERK. This work is joint with Lewis Marsh, Emilie Dufresne, Helen Byrne and Stanislav Shvartsman.

Machine learning for scientific applications faces the challenge of limited data. To reduce overfitting, we propose a framework to leverage as much as possible a priori-known physics for a problem. Our approach embeds a deep neural network in a partial differential equation (PDE) system, where the pre-specified terms in the PDE capture the known physics and the neural network will learn to describe the unknown physics. The neural network is estimated from experimental and/or high-fidelity numerical datasets. We call this approach a “deep learning PDE model” (DPM). Once trained, the DPM can be used to make out-of-sample predictions for new physical coefficients, geometries, and boundary conditions. We implement our approach on a classic problem of the Navier-Stokes equation for turbulent flows. The numerical solution of the Navier-Stokes equation (with turbulence) is computationally expensive and requires a supercomputer. We show that our approach can estimate a (computationally fast) DPM for the filtered velocity of the Navier-Stokes equations.

A central task in modeling, which has to be performed each day in banks and financial institutions, is to calibrate models to market and historical data. So far the choice which models should be used was not only driven by their capacity of capturing empirically the observed market features well, but rather by computational tractability considerations. Due to recent work in the context of machine learning, this notion of tractability has changed significantly. In this work, we show how a neural network approach can be applied to the calibration of (multivariate) local stochastic volatility models. We will see how an efficient calibration is possible without the need of interpolation methods for the financial data. Joint work with Christa Cuchiero and Josef Teichmann.

Data analysis techniques are often highly domain specific - there are often certain patterns which should be in certain types of data but may not be apparent in data. The first part of the talk will cover a technique for finding such patterns through a tool which combines visual analytics and machine learning to provide insight into temporal multivariate data. The second half of the talk will discuss recent work on imposing high level geometric structure into continuous optimizations including deep neural networks.

Dimension reduction is an overarching theme in data science: we enjoy finding informative patterns, features or substructures in large, complex data sets. Within the field of network science, an important problem of this nature is to identify core-periphery structure. Given a network, our task is to assign each node to either the core or periphery. Core nodes should be strongly connected across the whole network whereas peripheral nodes should be strongly connected only to core nodes. More generally, we may wish to assign a non-negative value to each node, with a larger value indicating greater "coreness." This type of problem is related to, but distinct from, commumnity detection (finding clusters) and centrality assignment (finding key players), and it arises naturally in the study of networks in social science and finance. We derive and analyse a new iterative algorithm for detecting network core-periphery structure.

Using techniques in nonlinear Perron-Frobenius theory we prove global convergence to the unique solution of a relaxed version of a natural discrete optimization problem. On sparse networks, the cost of each iteration scales linearly with the number of nodes, making the algorithm feasible for large-scale problems. We give an alternative interpretation of the algorithm from the perspective of maximum likelihood reordering of a new logistic core--periphery random graph model. This viewpoint also gives a new basis for quantitatively judging a core--periphery detection algorithm. We illustrate the algorithm on a range of synthetic and real networks, and show that it offers advantages over the current state-of-the-art.

This is joint work with Francesco Tudisco (Strathclyde)

The problem of decomposing a given dataset as a superposition of basic motifs arises in a wide range of application areas, including neural spike sorting and the analysis of astrophysical and microscopy data. Motivated by these problems, we study a "short-and-sparse" deconvolution problem, in which the goal is to recover a short motif a from its convolution with a random spike train $x$. We formulate this problem as optimization over the sphere. We analyze the geometry of this (nonconvex) optimization problem, and argue that when the target spike train is sufficiently sparse, on a region of the sphere, every local minimum is equivalent to the ground truth, up to symmetry (here a signed shift). This characterization obtains, e.g., for generic kernels of length $k$, when the sparsity rate of the spike train is proportional to $k^{-2/3}$ (i.e., roughly $k^{1/3}$ spikes in each length-$k$ window). This geometric characterization implies that efficient methods obtain the ground truth under the same conditions.

Our analysis highlights the key roles of symmetry and negative curvature in the behavior of efficient methods -- in particular, the role of a "dispersive" structure in promoting efficient convergence to global optimizers without the need to explicitly leverage second-order information. We sketch connections to broader families of benign nonconvex problems in machine learning and signal processing, in which efficient methods obtain global optima independent of initialization. These problems include variants of sparse dictionary learning, tensor decomposition, and phase recovery.

Joint work with Yuqian Zhang, Yenson Lau, Han-Wen Kuo, Dar Gilboa, Sky Cheung, Abhay Pasupathy