The phenomenon of poor algorithmic scalability is a critical problem in large-scale machine learning and data science. This has led to a resurgence in the use of first-order (Hessian-free) algorithms from classical optimisation. One major drawback is that first-order methods tend to converge extremely slowly. However, there exist techniques for efficiently accelerating them.

The topic of this talk is the Dual Regularisation Nonlinear Acceleration algorithm (DRNA) (Geleta, 2017) for nonconvex optimisation. Numerical studies using the CUTEst optimisation problem set show the method to accelerate several nonconvex optimisation algorithms, including quasi-Newton BFGS and steepest descent methods. DRNA compares favourably with a number of existing accelerators in these studies.

DRNA extends to the nonconvex setting a recent acceleration algorithm due to Scieur et al. (Advances in Neural Information Processing Systems 29, 2016). We have proven theorems relating DRNA to the Kylov subspace method GMRES, as well as to Anderson's acceleration method and family of multi-secant quasi-Newton methods.

# Past Numerical Analysis Group Internal Seminar

In this talk we introduce a novel dynamic programming (DP) approximation that exploits the inherent network structure present in revenue management problems. In particular, our approximation provides a new lower bound on the value function for the DP, which enables conservative revenue forecasts to be made. Existing state of the art approximations of the revenue management DP neglect the network structure, apportioning the prices of each product, whereas our proposed method does not: we partition the network of products into clusters by apportioning the capacities of resources. Our proposed approach allows, in principle, for better approximations of the DP to be made than the decomposition methods currently implemented in industry and we see it as an important stepping stone towards better approximate DP methods in practice.

One of the key challenges in revenue management is unconstraining demand data. Existing state of the art single-class unconstraining methods make restrictive assumptions about the form of the underlying demand and can perform poorly when applied to data which breaks these assumptions. In this talk, we propose a novel unconstraining method that uses Gaussian process (GP) regression. We develop a novel GP model by constructing and implementing a new non-stationary covariance function for the GP which enables it to learn and extrapolate the underlying demand trend. We show that this method can cope with important features of realistic demand data, including nonlinear demand trends, variations in total demand, lengthy periods of constraining, non-exponential inter-arrival times, and discontinuities/changepoints in demand data. In all such circumstances, our results indicate that GPs outperform existing single-class unconstraining methods.

In this talk we describe a new approach that enables the use of elliptic PDEs with white noise forcing to sample Matérn fields within the multilevel Monte Carlo (MLMC) framework.

When MLMC is used to quantify the uncertainty in the solution of PDEs with random coefficients, two key ingredients are needed: 1) a sampling technique for the coefficients that satisfies the MLMC telescopic sum and 2) a numerical solver for the forward PDE problem.

When the dimensionality of the uncertainty in the problem is infinite (i.e. coefficients are random fields), the sampling techniques commonly used in the literature are Karhunen–Loève expansions or circulant embeddings. In the specific case in which the coefficients are Gaussian fields of Mat ́ern covariance structure another sampling technique available relies on the solution of a linear elliptic PDE with white noise forcing.

When the finite element method (FEM) is used for the forward problem, the latter option can become advantageous as elliptic PDEs can be quickly and efficiently solved with the FEM, the sampling can be performed in parallel and the same FEM software can be used without the need for external packages. However, it is unclear how to enforce a good stochastic coupling of white noise between MLMC levels so as to respect the MLMC telescopic sum. In this talk we show how this coupling can be enforced in theory and in practice.

We propose and analyze a multilevel weighted least squares polynomial approximation method. Weighted least squares polynomial approximation uses random samples to determine projections of functions onto spaces of polynomials. It has been shown that using an optimal distribution of sample locations, the number of samples required to achieve quasi-optimal approximation in a given polynomial subspace scales, up to a logarithmic factor, linearly in the dimension of this space. However, in many applications, the computation of samples includes a numerical discretization error. Thus, obtaining polynomial approximations with a single level method can become prohibitively expensive, as it requires a sufficiently large number of samples, each computed with a sufficiently small discretization error. As a solution to this problem, we propose a multilevel method, which employs samples with different accuracies and is able to match the accuracy of single level approximations at reduced computational work. We prove complexity bounds under certain assumptions on polynomial approximability and sample work. Furthermore, we propose an adaptive

algorithm for situations where such assumptions cannot be verified a priori. Numerical experiments underline the practical applicability of our method.

In this talk, a novel discontinuous Galerkin (DG) method is introduced by utilising the principle of discrete least squares. The key idea is to build polynomial approximations by the method of (weighted) discrete least squares instead of usual interpolation or (discrete) $L^2$ projections. The resulting method hence uses more information of the underlying function and provides a more robust alternative to common DG methods. As a result, we are able to construct high-order schemes which are conservative as well as linear stable on *any* set of collocation points. Several numerical tests highlight the new *discontinuous Galerkin discrete least squares (DG-DLS)* method to significantly outperform present-day DG methods.

High-order methods for conservation laws can be highly efficient if their stability is ensured. A suitable means mimicking estimates of the continuous level is provided by summation-by-parts (SBP) operators and the weak enforcement of boundary conditions. Recently, there has been an increasing interest in generalised SBP operators both in the finite difference and the discontinuous Galerkin spectral element framework.

However, if generalised SBP operators are used, the treatment of boundaries becomes more difficult since some properties of the continuous level are no longer mimicked discretely —interpolating the product of two functions will in general result in a value different from the product of the interpolations. Thus, desired properties such as conservation and stability are more difficult to obtain.

In this talk, the concept of generalised SBP operators and their application to entropy stable semidiscretisations will be presented. Several recent ideas extending the range of possible methods are discussed, presenting both advantages and several shortcomings.

We consider a generalization of low-rank matrix completion to the case where the data belongs to an algebraic variety, i.e., each data point is a solution to a system of polynomial equations. In this case, the original matrix is possibly high-rank, but it becomes low-rank after mapping each column to a higher dimensional space of monomial features. Many well-studied extensions of linear models, including affine subspaces and their union, can be described by a variety model. We study the sampling requirements for matrix completion under a variety model with a focus on a union of subspaces. We also propose an efficient matrix completion algorithm that minimizes a surrogate of the rank of the matrix of monomial features, which is able to recover synthetically generated data up to the predicted sampling complexity bounds. The proposed algorithm also outperforms standard low-rank matrix completion and subspace clustering techniques in experiments with real data.

Starting from an example in quantum chemistry, we explain the techniques of Numerical Tensor Calculus with particular emphasis on the convolution operation. The tensorisation technique also applies to one-dimensional grid functions and allows to perform the convolution with a cost which may be much cheaper than the fast Fourier transform.