STFC Rutherford Appleton Laboratory | Mathematical Institute

Thu, 17 Oct 2024

14:00 - 15:00

Lecture Room 3

On the loss of orthogonality in low-synchronization variants of reorthogonalized block classical Gram-Schmidt

Kathryn Lund

(STFC Rutherford Appleton Laboratory)

Abstract

Interest in communication-avoiding orthogonalization schemes for high-performance computing has been growing recently. We address open questions about the numerical stability of various block classical Gram-Schmidt variants that have been proposed in the past few years. An abstract framework is employed, the flexibility of which allows for new rigorous bounds on the loss of orthogonality in these variants. We first analyse a generalization of (reorthogonalized) block classical Gram-Schmidt and show that a "strong'' intrablock orthogonalization routine is only needed for the very first block in order to maintain orthogonality on the level of the unit roundoff.

Using this variant, which has four synchronization points per block column, we remove the synchronization points one at a time and analyse how each alteration affects the stability of the resulting method. Our analysis shows that the variant requiring only one synchronization per block column cannot be guaranteed to be stable in practice, as stability begins to degrade with the first reduction of synchronization points.

Our analysis of block methods also provides new theoretical results for the single-column case. In particular, it is proven that DCGS2 from Bielich, D. et al. {Par. Comput.} 112 (2022)] and CGS-2 from Swirydowicz, K. et al, {Num. Lin. Alg. Appl.} 28 (2021)] are as stable as Householder QR.
Numerical examples from the BlockStab toolbox are included throughout, to help compare variants and illustrate the effects of different choices of intraorthogonalization subroutines.

Thu, 03 Nov 2022

14:00 - 15:00

Algebraic Spectral Multilevel Domain Decomposition Preconditioners

Hussam Al Daas

(STFC Rutherford Appleton Laboratory)

Abstract

Solving sparse linear systems is omnipresent in scientific computing. Direct approaches based on matrix factorization are very robust, and since they can be used as a black-box, it is easy for other software to use them. However, the memory requirement of direct approaches scales poorly with the problem size, and the algorithms underpinning sparse direct solvers software are poorly suited to parallel computation. Multilevel Domain decomposition (MDD) methods are among the most efficient iterative methods for solving sparse linear systems. One of the main technical difficulties in using efficient MDD methods (and most other efficient preconditioners) is that they require information from the underlying problem which prohibits them from being used as a black-box. This was the motivation to develop the widely used algebraic multigrid for example. I will present a series of recently developed robust and fully algebraic MDD methods, i.e., that can be constructed given only the coefficient matrix and guarantee a priori prescribed convergence rate. The series consists of preconditioners for sparse least-squares problems, sparse SPD matrices, general sparse matrices, and saddle-point systems. Numerical experiments illustrate the effectiveness, wide applicability, scalability of the proposed preconditioners. A comparison of each one against state-of-the-art preconditioners is also presented.

Fri, 04 Dec 2015

10:00 - 11:00

Analysis of images in multidimensional single molecule microscopy

Michael Hirsch

(STFC Rutherford Appleton Laboratory)

Abstract

Multidimensional single molecule microscopy (MSMM) generates image time series of biomolecules in a cellular environment that have been tagged with fluorescent labels. Initial analysis steps of such images consist of image registration of multiple channels, feature detection and single particle tracking. Further analysis may involve the estimation of diffusion rates, the measurement of separations between molecules that are not optically resolved and more. The analysis is done under the condition of poor signal to noise ratios, high density of features and other adverse conditions. Pushing the boundary of what is measurable, we are facing among others the following challenges. Firstly the correct assessment of the uncertainties and the significance of the results, secondly the fast and reliable identification of those features and tracks that fulfil the assumptions of the models used. Simpler models require more rigid preconditions and therefore limiting the usable data, complexer models are theoretically and especially computationally challenging.

Tue, 09 Jun 2015

14:00 - 14:30

Sparse matrix orderings: it's child's play! Or is it?

Sue Thorne

(STFC Rutherford Appleton Laboratory)

Abstract

Sparse matrices occur in numerical simulations throughout science and engineering. In particular, it is often desirable to solve systems of the form Ax=b, where A is a sparse matrix with 100,000+ rows and columns. The order that the rows and columns occur in can have a dramatic effect on the viability of a direct solver e.g., the time taken to find x, the amount of memory needed, the quality of x,... We shall consider symmetric matrices and, with the help of playdough, explore how best to order the rows/columns using a nested dissection strategy. Starting with a straightforward strategy, we will discover the pitfalls and develop an adaptive strategy with the aim of coping with a large variety of sparse matrix structures.

Some of the talk will involve the audience playing with playdough, so bring your inner child along with you!

Tue, 28 Oct 2014

14:30 - 15:00

Sparse Compressed Threshold Pivoting

Jonathan Hogg

(STFC Rutherford Appleton Laboratory)

Abstract

Traditionally threshold partial pivoting is used to ensure stability of sparse LDL^T factorizations of symmetric matrices. This involves comparing a candidate pivot with all entries in its row/column to ensure that growth in the size of the factors is limited by a threshold at each stage of the factorization. It is capabale of delivering a scaled backwards error on the level of machine precision for practically all real world matrices. However it has significant flaws when used in a massively parallel setting, such as on a GPU or modern supercomputer. It requires all entries of the column to be up-to-date and requires an all-to-all communication for every column. The latter requirement can be performance limiting as the factorization cannot proceed faster than k*(communication latency), where k is the length of the longest path in the sparse elimination tree.

We introduce a new family of communication-avoiding pivoting techniques that reduce the number of messages required by a constant factor allowing the communication cost to be more effectively hidden by computation. We exhibit two members of this family. The first deliver equivalent stability to threshold partial pivoting, but is more pessimistic, leading to additional fill in the factors. The second provides similar fill levels as traditional techniques and, whilst demonstrably unstable for pathological cases, is able to deliver machine accuracy on even the worst real world examples.

Tue, 17 Jun 2014

14:00 - 14:30

Memory efficient incomplete factorization preconditioners for sparse symmetric systems

Jennifer Scott

(STFC Rutherford Appleton Laboratory)

Abstract

Incomplete Cholesky (IC) factorizations have long been an important tool in the armoury of methods for the numerical solution of large sparse symmetric linear systems Ax = b. In this talk, I will explain the use of intermediate memory (memory used in the construction of the incomplete factorization but is subsequently discarded) and show how it can significantly improve the performance of the resulting IC preconditioner. I will then focus on extending the approach to sparse symmetric indefinite systems in saddle-point form. A limited-memory signed IC factorization of the form LDLT is proposed, where the diagonal matrix D has entries +/-1. The main advantage of this approach is its simplicity as it avoids the use of numerical pivoting. Instead, a global shift strategy is used to prevent breakdown and to improve performance. Numerical results illustrate the effectiveness of the signed incomplete Cholesky factorization as a preconditioner.

Subscribe to STFC Rutherford Appleton Laboratory