We will present a very general framework for unconstrained stochastic optimization which is based on standard trust region framework using random models. In particular this framework retains the desirable features such step acceptance criterion, trust region adjustment and ability to utilize of second order models. We make assumptions on the stochasticity that are different from the typical assumptions of stochastic and simulation-based optimization. In particular we assume that our models and function values satisfy some good quality conditions with some probability fixed, but can be arbitrarily bad otherwise. We will analyze the convergence and convergence rates of this general framework and discuss the requirement on the models and function values. We will will contrast our results with existing results from stochastic approximation literature. We will finish with examples of applications arising the area of machine learning.

# Past Computational Mathematics and Applications Seminar

Work with Jemima Tabeart, Sarah Dance, Nancy Nichols, Joanne Waller (University of Reading) and Stefano Migliorini, Fiona Smith (Met Office).

In environmental prediction variational data assimilation (DA) is a method for using observational data to estimate the current state of the system. The DA problem is usually solved as a very large nonlinear least squares problem, in which the fit to the measurements is balanced against the fit to a previous model forecast. These two terms are weighted by matrices describing the correlations of the errors in the forecast and in the observations. Until recently most operational weather and ocean forecasting systems assumed that the errors in the observations are uncorrelated. However, as we move to higher resolution observations then it is becoming more important to specify observation error correlations. In this work we look at the effect this has on the conditioning of the optimization problem. In the context of a linear system we develop bounds on the condition number of the problem in the presence of correlated observation errors. We show that the condition number is very dependent on the minimum eigenvalue of the observation error correlation matrix. We then present results using the Met Office data assimilation system, in which different methods for reconditioning the correlation matrix are tested. We investigate the effect of these different methods on the conditioning and the final solution of the problem.

Inverse problems are ubiquitous in many areas of Science and Engineering and, once discretised, they lead to ill-conditioned linear systems, often of huge dimensions: regularisation consists in replacing the original system by a nearby problem with better numerical properties, in order to find meaningful approximations of its solution. In this talk we will explore the regularisation properties of many iterative methods based on Krylov subspaces. After surveying some basic methods such as CGLS and GMRES, innovative approaches based on flexible variants of CGLS and GMRES will be presented, in order to efficiently enforce nonnegativity and sparsity into the solution.

We consider the Cauchy (or steepest descent) method with exact line search applied to a strongly convex function with Lipschitz continuous gradient. We establish the exact worst-case rate of convergence of this scheme, and show that this worst-case behavior is exhibited by a certain convex quadratic function. We also give worst-case complexity bound for a noisy variant of gradient descent method. Finally, we show that these results may be applied to study the worst-case performance of Newton's method for the minimization of self-concordant functions.

The proofs are computer-assisted, and rely on the resolution of semidefinite programming performance estimation problems as introduced in the paper [Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming, 145(1-2):451-482, 2014].

Joint work with F. Glineur and A.B. Taylor.

Functions defined by evaluation programs involving smooth elementals and absolute values as well as max and min are piecewise smooth. For this class we present first and second order, necessary and sufficient conditions for the functions to be locally optimal, or convex, or at least possess a supporting hyperplane. The conditions generalize the classical KKT and SSC theory and are constructive; though in the case of convexity they may be combinatorial to verify. As a side product we find that, under the Mangasarin-Fromowitz-Kink-Qualification, the well established nonsmooth concept of subdifferential regularity is equivalent to first order convexity. All results are based on piecewise linearization and suggest corresponding optimization algorithms.

We propose a multilevel paradigm for the global optimisation of polynomials with sparse support. Such polynomials arise through the discretisation of PDEs, optimal control problems and in global optimization applications in general. We construct projection operators to relate the primal and dual variables of the SDP relaxation between lower and higher levels in the hierarchy, and theoretical results are proven to confirm their usefulness. Numerical results are presented for polynomial problems that show how these operators can be used in a hierarchical fashion to solve large scale problems with high accuracy.

I will present a broad family of stochastic algorithms for inverting a matrix, including specialized variants which maintain symmetry or positive definiteness of the iterates. All methods in the family converge globally and linearly, with explicit rates. In special cases, the methods obtained are stochastic block variants of several quasi-Newton updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB), Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). After a pause for questions, I will then present a block stochastic BFGS method based on the stochastic method for inverting positive definite matrices. In this method, the estimate of the inverse Hessian matrix that is maintained by it, is updated at each iteration using a sketch of the Hessian, i.e., a randomly generated compressed form of the Hessian. I will propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients, and prove linear convergence of the resulting method. Numerical tests on large-scale logistic regression problems reveal that our method is more robust and substantially outperforms current state-of-the-art methods.

Most current methods of Magnetic Resonance Imaging (MRI) reconstruction interpret raw signal values as samples of the Fourier transform of the object. Although this is computationally convenient, it neglects relaxation and off–resonance evolution in phase, both of which can occur to significant extent during a typical MRI signal. A more accurate model, known as Parameter Assessment by Recovery from Signal Encoding (PARSE), takes the time evolution of the signal into consideration. This model uses three parameters that depend on tissue properties: transverse magnetization, signal decay rate, and frequency offset from resonance. Two difficulties in recovering an image using this model are the low SNR for long acquisition times in single-shot MRI, and the nonlinear dependence of the signal on the decay rate and frequency offset. In this talk, we address the latter issue by using a second order approximation of the original PARSE model. The linearized model can be solved using convex optimization augmented with well-stablished regularization techniques such as total variation. The sensitivity of the parameters to noise and computational challenges associated with this approximation will be discussed.