I will present a broad family of stochastic algorithms for inverting a matrix, including specialized variants which maintain symmetry or positive definiteness of the iterates. All methods in the family converge globally and linearly, with explicit rates. In special cases, the methods obtained are stochastic block variants of several quasi-Newton updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB), Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). After a pause for questions, I will then present a block stochastic BFGS method based on the stochastic method for inverting positive definite matrices. In this method, the estimate of the inverse Hessian matrix that is maintained by it, is updated at each iteration using a sketch of the Hessian, i.e., a randomly generated compressed form of the Hessian. I will propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients, and prove linear convergence of the resulting method. Numerical tests on large-scale logistic regression problems reveal that our method is more robust and substantially outperforms current state-of-the-art methods.

# Past Computational Mathematics and Applications Seminar

Most current methods of Magnetic Resonance Imaging (MRI) reconstruction interpret raw signal values as samples of the Fourier transform of the object. Although this is computationally convenient, it neglects relaxation and off–resonance evolution in phase, both of which can occur to significant extent during a typical MRI signal. A more accurate model, known as Parameter Assessment by Recovery from Signal Encoding (PARSE), takes the time evolution of the signal into consideration. This model uses three parameters that depend on tissue properties: transverse magnetization, signal decay rate, and frequency offset from resonance. Two difficulties in recovering an image using this model are the low SNR for long acquisition times in single-shot MRI, and the nonlinear dependence of the signal on the decay rate and frequency offset. In this talk, we address the latter issue by using a second order approximation of the original PARSE model. The linearized model can be solved using convex optimization augmented with well-stablished regularization techniques such as total variation. The sensitivity of the parameters to noise and computational challenges associated with this approximation will be discussed.

We consider the problem of computing a nonnegative low rank factorization to a given nonnegative input matrix under the so-called "separabilty condition". This assumption makes this otherwise NP hard problem polynomial time solvable, and we will use first order optimization techniques to compute such a factorization. The optimization model use is based on sparse regression with a self-dictionary, in which the low rank constraint is relaxed to the minimization of an l1-norm objective function. We apply these techniques to endmember detection and classification in hyperspecral imaging data.

The matrix logarithm, when applied to symmetric positive definite matrices, is known to satisfy a notable concavity property in the positive semidefinite (Loewner) order. This concavity property is a cornerstone result in the study of operator convex functions and has important applications in matrix concentration inequalities and quantum information theory.

In this talk I will show that certain rational approximations of the matrix logarithm remarkably preserve this concavity property and moreover, are amenable to semidefinite programming. Such approximations allow us to use off-the-shelf semidefinite programming solvers for convex optimization problems involving the matrix logarithm. These approximations are also useful in the scalar case and provide a much faster alternative to existing methods based on successive approximation for problems involving the exponential/relative entropy cone. I will conclude by showing some applications to problems arising in quantum information theory.

This is joint work with James Saunderson (Monash University) and Pablo Parrilo (MIT)

Rational Krylov methods are applicable to a wide range of scientific computing problems, and the rational Arnoldi algorithm is a commonly used procedure for computing an orthonormal basis of a rational Krylov space. Typically, the computationally most expensive component of this algorithm is the solution of a large linear system of equations at each iteration. We explore the option of solving several linear systems simultaneously, thus constructing the rational Krylov basis in parallel. If this is not done carefully, the basis being orthogonalized may become badly conditioned, leading to numerical instabilities in the orthogonalization process. We introduce the new concept of continuation pairs which gives rise to a near-optimal parallelization strategy that allows to control the growth of the condition number of this nonorthogonal basis. As a consequence we obtain a significantly more accurate and reliable parallel rational Arnoldi algorithm.

The computational benefits are illustrated using several numerical examples from different application areas.

This talk is based on joint work with Mario Berljafa available as an Eprint at http://eprints.ma.man.ac.uk/2503/

We present global rates of convergence for a general class of methods for nonconvex smooth optimization that include linesearch, trust-region and regularisation strategies, but that allow inaccurate problem information. Namely, we assume the local (first- or second-order) models of our function are only sufficiently accurate with a certain probability, and they can be arbitrarily poor otherwise. This framework subsumes certain stochastic gradient analyses and derivative-free techniques based on random sampling of function values. It can also be viewed as a robustness

assessment of deterministic methods and their resilience to inaccurate derivative computation such as due to processor failure in a distribute framework. We show that in terms of the order of the accuracy, the evaluation complexity of such methods is the same as their counterparts that use deterministic accurate models; the use of probabilistic models only increases the complexity by a constant, which depends on the probability of the models being good. Time permitting, we also discuss the case of inaccurate, probabilistic function value information, that arises in stochastic optimization. This work is joint with Katya Scheinberg (Lehigh University, USA).

For linear dynamical systems, model reduction has achieved great success. In the case of linear dynamics, we know how to construct, at a modest cost, (locally) optimal, input-independent reduced models; that is, reduced models that are uniformly good over all inputs having bounded energy. In addition, in some cases we can achieve this goal using only input/output data without a priori knowledge of internal dynamics. Even though model reduction has been successfully and effectively applied to nonlinear dynamical systems as well, in this setting, bot the reduction process and the reduced models are input dependent and the high fidelity of the resulting approximation is generically restricted to the training input/data. In this talk, we will offer remedies to this situation.

To predict the behaviour of a dynamical system using a mathematical model, an accurate estimate of the current state of the system is needed in order to initialize the model. Complete information on the current state is, however, seldom available. The aim of optimal state estimation, known in the geophysical sciences as ‘data assimilation’, is to determine a best estimate of the current state using measured observations of the real system over time, together with the model equations. The problem is commonly formulated in variational terms as a very large nonlinear least-squares optimization problem. The lack of complete data, coupled with errors in the observations and in the model, leads to a highly ill-conditioned inverse problem that is difficult to solve.

To understand the nature of the inverse problem, we examine how different components of the assimilation system influence the conditioning of the optimization problem. First we consider the case where the dynamical equations are assumed to model the real system exactly. We show, against intuition, that with increasingly dense and precise observations, the problem becomes harder to solve accurately. We then extend these results to a 'weak-constraint' form of the problem, where the model equations are assumed not to be exact, but to contain random errors. Two different, but mathematically equivalent, forms of the problem are derived. We investigate the conditioning of these two forms and find, surprisingly, that these have quite different behaviour.