We present global rates of convergence for a general class of methods for nonconvex smooth optimization that include linesearch, trust-region and regularisation strategies, but that allow inaccurate problem information. Namely, we assume the local (first- or second-order) models of our function are only sufficiently accurate with a certain probability, and they can be arbitrarily poor otherwise. This framework subsumes certain stochastic gradient analyses and derivative-free techniques based on random sampling of function values. It can also be viewed as a robustness

assessment of deterministic methods and their resilience to inaccurate derivative computation such as due to processor failure in a distribute framework. We show that in terms of the order of the accuracy, the evaluation complexity of such methods is the same as their counterparts that use deterministic accurate models; the use of probabilistic models only increases the complexity by a constant, which depends on the probability of the models being good. Time permitting, we also discuss the case of inaccurate, probabilistic function value information, that arises in stochastic optimization. This work is joint with Katya Scheinberg (Lehigh University, USA).

# Past Computational Mathematics and Applications Seminar

For linear dynamical systems, model reduction has achieved great success. In the case of linear dynamics, we know how to construct, at a modest cost, (locally) optimal, input-independent reduced models; that is, reduced models that are uniformly good over all inputs having bounded energy. In addition, in some cases we can achieve this goal using only input/output data without a priori knowledge of internal dynamics. Even though model reduction has been successfully and effectively applied to nonlinear dynamical systems as well, in this setting, bot the reduction process and the reduced models are input dependent and the high fidelity of the resulting approximation is generically restricted to the training input/data. In this talk, we will offer remedies to this situation.

To predict the behaviour of a dynamical system using a mathematical model, an accurate estimate of the current state of the system is needed in order to initialize the model. Complete information on the current state is, however, seldom available. The aim of optimal state estimation, known in the geophysical sciences as ‘data assimilation’, is to determine a best estimate of the current state using measured observations of the real system over time, together with the model equations. The problem is commonly formulated in variational terms as a very large nonlinear least-squares optimization problem. The lack of complete data, coupled with errors in the observations and in the model, leads to a highly ill-conditioned inverse problem that is difficult to solve.

To understand the nature of the inverse problem, we examine how different components of the assimilation system influence the conditioning of the optimization problem. First we consider the case where the dynamical equations are assumed to model the real system exactly. We show, against intuition, that with increasingly dense and precise observations, the problem becomes harder to solve accurately. We then extend these results to a 'weak-constraint' form of the problem, where the model equations are assumed not to be exact, but to contain random errors. Two different, but mathematically equivalent, forms of the problem are derived. We investigate the conditioning of these two forms and find, surprisingly, that these have quite different behaviour.

The requirement to compute Jordan blocks for multiple eigenvalues arises in a number of physical problems, for example panel flutter problems in aerodynamical stability, the stability of electrical power systems, and in quantum mechanics. We introduce a general method for computing a 2-dimensional Jordan block in a parameter-dependent matrix eigenvalue problem based on the so called Implicit Determinant Method. This is joint work with Alastair Spence (Bath).

In many applications we need to find or estimate the $p \ge 1$ largest elements of a matrix, along with their locations. This is required for recommender systems used by Amazon and Netflix, link prediction in graphs, and in finding the most important links in a complex network, for example.

Our algorithm uses only matrix vector products and is based upon a power method for mixed subordinate norms. We have obtained theoretical results on the convergence of this algorithm via a comparison with rook pivoting for the LU decomposition. We have also improved the practicality of the algorithm by producing a blocked version iterating on $n \times t$ matrices, as opposed to vectors, where $t$ is a tunable parameter. For $p > 1$ we show how deflation can be used to improve the convergence of the algorithm.

Finally, numerical experiments on both randomly generated matrices and real-life datasets (the latter for $A^TA$ and $e^A$) show how our algorithms can reliably estimate the largest elements of a matrix whilst obtaining considerable speedups when compared to forming the matrix explicitly: over 1000x in some cases.

For many years, sum-factored algorithms for finite elements in rectangular reference geometry have combined low complexity with the mathematical power of high-order approximation. However, such algorithms rely heavily on the tensor product structure inherent in the geometry and basis functions, and similar algorithms for simplicial geometry have proven elusive.

Bernstein polynomials are totally nonnegative, rotationally symmetric, and geometrically decomposed bases with many other remarkable properties that lead to optimal-complexity algorithms for element wise finite element computations. The also form natural building blocks for the finite element exterior calculus bases for the de Rham complex so that H(div) and H(curl) bases have efficient representations as well. We will also their relevance for explicit discontinuous Galerkin methods, where the element mass matrix requires special attention.