We will review the basic building blocks of iterative solvers, i.e. sparse matrix-vector multiplication, in the context of GPU devices such

as the cards by NVIDIA; we will then discuss some techniques in preconditioning by approximate inverses, and we will conclude with an

application to an image processing problem from the biomedical field.

# Past Computational Mathematics and Applications Seminar

The aim of this talk is to design an efficient multigrid method for constrained convex optimization problems arising from discretization of some underlying infinite dimensional problems. Due to problem dependency of this approach, we only consider bound constraints with (possibly) a linear equality constraint. As our aim is to target large-scale problems, we want to avoid computation of second

derivatives of the objective function, thus excluding Newton like methods. We propose a smoothing operator that only uses first-order information and study the computational efficiency of the resulting method. In the second part, we consider application of multigrid techniques to more general optimization problems, in particular, the topology design problem.

Quadrature is the term for the numerical evaluation of integrals. It's a beautiful subject because it's so accessible, yet full of conceptual surprises and challenges. This talk will review ten of these, with plenty of history and numerical demonstrations. Some are old if not well known, some are new, and two are subjects of my current research.

Partial differential equations with more than three coordinates arise naturally if the model features certain kinds of stochasticity. Typical examples are the Schroedinger, Fokker-Planck and Master equations in quantum mechanics or cell biology, as well as quantification of uncertainty.

The principal difficulty of a straightforward numerical solution of such equations is the `curse of dimensionality': the storage cost of the discrete solution grows exponentially with the number of coordinates (dimensions).

One way to reduce the complexity is the low-rank separation of variables. One can see all discrete data (such as the solution) as multi-index arrays, or tensors. These large tensors are never stored directly.

We approximate them by a sum of products of smaller factors, each carrying only one of the original variables. I will present one of the simplest but powerful of such representations, the Tensor Train (TT) decomposition. The TT decomposition generalizes the approximation of a given matrix by a low-rank matrix to the tensor case. It was found that many interesting models allow such approximations with a significant reduction of storage demands.

A workhorse approach to computations with the TT and other tensor product decompositions is the alternating optimization of factors. The simple realization is however prone to convergence issues.

I will show some of the recent improvements that are indispensable for really many dimensions, or solution of linear systems with non-symmetric or indefinite matrices.

To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming

models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications.

In this talk we present the design of task-based sparse direct solvers on top of runtime systems. In the context of the

qr_mumps solver, we prove the usability and effectiveness of our approach with the implementation of a sparse matrix multifrontal factorization based on a Sequential Task flow parallel programming model. Using this programming model, we developed features such as the integration of dense 2D Communication Avoiding algorithms in the multifrontal method allowing for better scalability compared to the original approach used in qr_mumps.

Following this approach, we move to heterogeneous architectures where task granularity and scheduling strategies are critical to achieve performance. We present, for the multifrontal method, a hierarchical strategy for data partitioning and a scheduling algorithm capable of handling the heterogeneity of resources. Finally we introduce a memory-aware algorithm to control the memory behavior of our solver and show, in the context of multicore architectures, an important reduction of the memory footprint for the multifrontal QR factorization with a small impact on performance.

Functions are usually approximated numerically in a basis, a non-redundant and complete set of functions that span a certain space. In this talk we highlight a number of benefits of using overcomplete sets, in particular using the more general notion of a "frame". The main benefit is that frames are easily constructed even for functions of several variables on domains with irregular shapes. On the other hand, allowing for possible linear depencies naturally leads to ill-conditioning of approximation algorithms. The ill-conditioning is potentially severe. We give some useful examples of frames and we first address the numerical stability of best approximations in a frame. Next, we briefly describe special point sets in which interpolation turns out to be stable. Finally, we review so-called Fourier extensions and an efficient algorithm to approximate functions with spectral accuracy on domains without structure.

When assigned with the task of extracting information from given image data the first challenge one faces is the derivation of a truthful model for both the information and the data. Such a model can be determined by the a-priori knowledge about the image (information), the data and their relation to each other. The source of this knowledge is either our understanding of the type of images we want to reconstruct and of the physics behind the acquisition of the data or we can thrive to learn parametric models from the data itself. The common question arises: how can we customise our model choice to a particular application? Or better how can we make our model adaptive to the given data?

Starting from the first modelling strategy this talk will lead us from nonlinear diffusion equations and subdifferential inclusions of total variation type functionals as the most successful image modeltoday to non-smooth second- and third-order variational models, with data models for Gaussian and Poisson distributed data as well as impulse noise. These models exhibit solution-dependent adaptivities in form of nonlinearities or non-smooth terms in the PDE or the variational problem, respectively. Applications for image denoising, inpainting and surface reconstruction are given. After a critical discussion of these different image and data models we will turn towards the second modelling strategy and propose to combine it with the first one using a PDE constrained optimisation method that customises a parametrised form of the model by learning from examples. In particular, we will consider optimal parameter derivation for total variation denoising with multiple noise distributions and optimising total generalised variation regularisation for its application in photography.

Equations of quantum mechanics in the semiclassical regime present an enduring challenge for numerical analysts, because their solution is highly oscillatory and evolves on two scales. Standard computational approaches to the semiclassical Schrödinger equation do not allow for long time integration as required, for example, in quantum control of atoms by short laser bursts. This has motivated our approach of asymptotic splittings. Combining techniques from Lie-algebra theory and numerical algebra, we present a new computational paradigm of symmetric Zassenhaus splittings, which lends itself to a very precise discretisation in long time intervals, at very little cost. We will illustrate our talk by examples of quantum phenomena – quantum tunnelling and quantum scattering – and their computation and, time allowing, discuss an extension of this methodology to time-dependent semiclassical systems using Magnus expansions