Thu, 26 Oct 2023
14:00
Lecture Room 3

Algebraic domain-decomposition preconditioners for the solution of linear systems

Tyrone Rees
(Rutherford Appleton Laboratory)
Abstract

The need to solve linear systems of equations is ubiquitous in scientific computing. Powerful methods for preconditioning such systems have been developed in cases where we can exploit knowledge of the origin of the linear system; a recent example from the solution of systems from PDEs is the Gen-EO domain decomposition method which works well, but requires a non-trival amount of knowledge of the underlying problem to implement.  

In this talk I will present a new spectral coarse space that can be constructed in a fully-algebraic way, in contrast to most existing spectral coarse spaces, and will give a theoretical convergence result for Hermitian positive definite diagonally dominant matrices. Numerical experiments and comparisons against state-of-the-art preconditioners in the multigrid community show that the resulting two-level Schwarz preconditioner is efficient, especially for non-self-adjoint operators. Furthermore, in this case, our proposed preconditioner outperforms state-of-the-art preconditioners.

This is joint work with Hussam Al Daas, Pierre Jolivet and Jennifer Scott.

Thu, 30 Jan 2020

14:00 - 15:00
L4

Using shared and distributed memory in the solution of large sparse systems

Iain Duff
(Rutherford Appleton Laboratory)
Abstract

We discuss the design of algorithms and codes for the solution of large sparse systems of linear equations on extreme scale computers that are characterized by having many nodes with multi-core CPUs or GPUs. We first use two approaches to get good single node performance. For symmetric systems we use task-based algorithms based on an assembly tree representation of the factorization. We then use runtime systems for scheduling the computation on both multicore CPU nodes and GPU nodes [6]. In this work, we are also concerned with the efficient parallel implementation of the solve phase using the computed sparse factors, and we show impressive results relative to other state-of-the-art codes [3]. Our second approach was to design a new parallel threshold Markowitz algorithm [4] based on Luby’s method [7] for obtaining a maximal independent set in an undirected graph. This is a significant extension since our graph model is a directed graph. We then extend the scope of both these approaches to exploit distributed memory parallelism. In the first case, we base our work on the block Cimmino algorithm [1] using the ABCD software package coded by Zenadi in Toulouse [5, 8]. The kernel for this algorithm is the direct factorization of a symmetric indefinite submatrix for which we use the above symmetric code. To extend the unsymmetric code to distributed memory, we use the Zoltan code from Sandia [2] to partition the matrix to singly bordered block diagonal form and then use the above unsymmetric code on the blocks on the diagonal. In both cases, we illustrate the added parallelism obtained from combining the distributed memory parallelism with the high single-node performance and show that our codes out-perform other state-of-the-art codes. This work is joint with a number of people. We developed the algorithms and codes in an EU Horizon 2020 Project, called NLAFET, that finished on 30 April 2019. Coworkers in this were: Sebastien Cayrols, Jonathan Hogg, Florent Lopez, and Stojce ´ ∗@email 1 Nakov. Collaborators in the block Cimmino part of the project were: Philippe Leleux, Daniel Ruiz, and Sukru Torun. Our codes available on the github repository https://github.com/NLAFET.

References [1] M. ARIOLI, I. S. DUFF, J. NOAILLES, AND D. RUIZ, A block projection method for sparse matrices, SIAM J. Scientific and Statistical Computing, 13 (1992), pp. 47–70. [2] E. BOMAN, K. DEVINE, L. A. FISK, R. HEAPHY, B. HENDRICKSON, C. VAUGHAN, U. CATALYUREK, D. BOZDAG, W. MITCHELL, AND J. TERESCO, Zoltan 3.0: Parallel Partitioning, Load-balancing, and Data Management Services; User’s Guide, Sandia National Laboratories, Albuquerque, NM, 2007. Tech. Report SAND2007-4748W http://www.cs.sandia. gov/Zoltan/ug_html/ug.html. [3] S. CAYROLS, I. S. DUFF, AND F. LOPEZ, Parallelization of the solve phase in a task-based Cholesky solver using a sequential task flow model, Int. J. of High Performance Computing Applications, To appear (2019). NLAFET Working Note 20. RAL-TR-2018-008. [4] T. A. DAVIS, I. S. DUFF, AND S. NAKOV, Design and implementation of a parallel Markowitz threshold algorithm, Technical Report RAL-TR-2019-003, Rutherford Appleton Laboratory, Oxfordshire, England, 2019. NLAFET Working Note 22. Submitted to SIMAX. [5] I. S. DUFF, R. GUIVARCH, D. RUIZ, AND M. ZENADI, The augmented block Cimmino distributed method, SIAM J. Scientific Computing, 37 (2015), pp. A1248–A1269. [6] I. S. DUFF, J. HOGG, AND F. LOPEZ, A new sparse symmetric indefinite solver using a posteriori threshold pivoting, SIAM J. Scientific Computing, To appear (2019). NLAFET Working Note 21. RAL-TR-2018-012. [7] M. LUBY, A simple parallel algorithm for the maximal independent set problem, SIAM J. Computing, 15 (1986), pp. 1036–1053. [8] M. ZENADI, The solution of large sparse linear systems on parallel computers using a hybrid implementation of the block Cimmino method., These de Doctorat, ´ Institut National Polytechnique de Toulouse, Toulouse, France, decembre 2013.

Tue, 18 Feb 2020
14:30
L5

An element-based preconditioner for mixed finite element problems

Michael Wathen
(Rutherford Appleton Laboratory)
Abstract

We introduce a new and generic approximation to Schur complements arising from inf-sup stable mixed finite element discretizations of self-adjoint multi-physics problems. The approximation exploits the discretization mesh by forming local, or element, Schur complements and projecting them back to the global degrees of freedom. The resulting Schur complement approximation is sparse, has low construction cost (with the same order of operations as a general finite element matrix), and can be solved using off-the-shelf techniques, such as multigrid. Using the Ladyshenskaja-Babu\v{s}ka-Brezzi condition, we show that this approximation has favorable eigenvalue distributions with respect to the global Schur complement. We present several numerical results to demonstrate the viability of this approach on a range of applications. Interestingly, numerical results show that the method gives an effective approximation to the non-symmetric Schur complement from the steady state Navier-Stokes equations.
 

Tue, 18 Feb 2020
14:00
L5

FitBenchmarking: A tool for comparing fitting routines for our National Facilities (and beyond)

Tyrone Rees
(Rutherford Appleton Laboratory)
Abstract

In STFC's Computational Mathematics Group, we work alongside scientists at large-scale National Facilities, such as ISIS Neutron and Muon source, Diamond Light Source, and the Central Laser Facility. For each of these groups, non-linear least squares fitting is a vital part of their scientific workflow. In this talk I will describe FitBenchmarking, a software tool we have developed to benchmark the performance of different non-linear least squares solvers on real-world data. It is designed to be easily extended, so that new data formats and new minimizers can be added. FitBenchmarking will allow (i) scientists to determine objectively which fitting engine is optimal for solving their problem on their computing architecture, (ii) scientific software developers to quickly test state-of-the-art algorithms in their data analysis software, and (iii) mathematicians and numerical software developers to test novel algorithms against realistic datasets, and to highlight characteristics of problems where the current best algorithms are not sufficient.
 

Tue, 16 Feb 2016
14:30
L5

How accurate must solves be in interior point methods?

Tyrone Rees
(Rutherford Appleton Laboratory)
Abstract

At the heart of the interior point method in optimization is a linear system solve, but how accurate must this solve be?  The behaviour of such methods is well-understood when a direct solver is used, but the scale of problems being tackled today means that users increasingly turn to iterative methods to approximate its solution.  Current suggestions of the accuracy required can be seen to be too stringent, leading to inefficiency.

In this talk I will give conditions on the accuracy of the solution in order to guarantee the inexact interior point method converges at the same rate as if there was an exact solve.  These conditions can be shown numerically to be tight, in that performance degrades rapidly if a weaker condition is used.  Finally, I will describe how the norms that appear in these condition are related to the natural norms that are minimized in several popular Krylov subspace methods. This, in turn, could help in the development of new preconditioners in this important field.

Thu, 04 Feb 2016

14:00 - 15:00
Rutherford Appleton Laboratory, nr Didcot

Task-based multifrontal QR solver for heterogeneous architectures

Dr Florent Lopez
(Rutherford Appleton Laboratory)
Abstract

To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming
models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. 

In this talk we present the design of task-based sparse direct solvers on top of runtime systems. In the context of the
qr_mumps solver, we prove the usability and effectiveness of our approach with the implementation of a sparse matrix multifrontal factorization based on a Sequential Task flow parallel programming model. Using this programming model, we developed features such as the integration of dense 2D Communication Avoiding algorithms in the multifrontal method allowing for better scalability compared to the original approach used in qr_mumps.

Following this approach, we move to heterogeneous architectures where task granularity and scheduling strategies are critical to achieve performance. We present, for the multifrontal method, a hierarchical strategy for data partitioning and a scheduling algorithm capable of handling the heterogeneity of resources.   Finally we introduce a memory-aware algorithm to control the memory behavior of our solver and show, in the context of multicore architectures, an important reduction of the memory footprint for the multifrontal QR factorization with a small impact on performance.

Thu, 23 Nov 2000

14:00 - 15:00
Rutherford Appleton Laboratory, nr Didcot

A stopping criterion for the conjugate gradient algorithm in a finite element method framework

Dr Mario Arioli
(Rutherford Appleton Laboratory)
Abstract

We combine linear algebra techniques with finite element techniques to obtain a reliable stopping criterion for the Conjugate Gradient algorithm. The finite element method approximates the weak form of an elliptic partial differential equation defined within a Hilbert space by a linear system of equations A x = b, where A is a real N by N symmetric and positive definite matrix. The conjugate gradient method is a very effective iterative algorithm for solving such systems. Nevertheless, our experiments provide very good evidence that the usual stopping criterion based on the Euclidean norm of the residual b - Ax can be totally unsatisfactory and frequently misleading. Owing to the close relationship between the conjugate gradient behaviour and the variational properties of finite element methods, we shall first summarize the principal properties of the latter. Then, we will use the recent results of [1,2,3,4]. In particular, using the conjugate gradient, we will compute the information which is necessary to evaluate the energy norm of the difference between the solution of the continuous problem, and the approximate solution obtained when we stop the iterations by our criterion.

Finally, we will present the numerical experiments we performed on a selected ill-conditioned problem.

References

  • [1] M. Arioli, E. Noulard, and A. Russo, Vector Stopping Criteria for Iterative Methods: Applications to PDE's, IAN Tech. Rep. N.967, 1995.
  • [2] G.H. Golub and G. Meurant, Matrices, moments and quadrature II; how to compute the norm of the error in iterative methods, BIT., 37 (1997), pp.687-705.
  • [3] G.H. Golub and Z. Strakos, Estimates in quadratic formulas, Numerical Algorithms, 8, (1994), pp.~241--268.
  • [4] G. Meurant, The computation of bounds for the norm of the error in the conjugate gradient algorithm, Numerical Algorithms, 16, (1997), pp.~77--87.
Thu, 25 Jan 2007

14:00 - 15:00
Rutherford Appleton Laboratory, nr Didcot

GMRES preconditioned by a perturbed LDL^T decomposition with static pivoting

Dr Mario Arioli
(Rutherford Appleton Laboratory)
Abstract

A strict adherence to threshold pivoting in the direct solution of symmetric indefinite problems can result in substantially more work and storage than forecast by an sparse analysis of the symmetric problem. One way of avoiding this is to use static pivoting where the data structures and pivoting sequence generated by the analysis are respected and pivots that would otherwise be very small are replaced by a user defined quantity. This can give a stable factorization but of a perturbed matrix.

The conventional way of solving the sparse linear system is then to use iterative refinement (IR) but there are cases where this fails to converge. We will discuss the use of more robust iterative methods, namely GMRES and its variant FGMRES and their backward stability when the preconditioning is performed by HSL_M57 with a static pivot option.

Several examples under Matlab will be presented.

Thu, 03 Nov 2005

14:00 - 15:00
Comlab

Solving large sparse symmetric linear systems out of core

Dr John Reid
(Rutherford Appleton Laboratory)
Abstract

Direct methods for solving large sparse linear systems of equations are popular because of their generality and robustness. As the requirements of computational scientists for more accurate models increases, so inevitably do the sizes of the systems that must be solved and the amount of memory needed by direct solvers.

For many users, the option of using a computer with a sufficiently large memory is either not available or is too expensive. Using a preconditioned iterative solver may be possible but for the "tough" systems that arise from many practical applications, the difficulties involved in finding and computing a good preconditioner can make iterative methods infeasible. An alternative is to use a direct solver that is able to hold its data structures on disk, that is, an out-of-core solver.

In this talk, we will explain the multifrontal algorithm and discuss the design and development of a new HSL sparse symmetric out-of-core solver that uses it. Both the system matrix A and its factors are stored externally. For the indefinite case, numerical pivoting using 1x1 and 2x2 pivots is incorporated. To minimise storage for the system data, a reverse communication interface is used. Input of A is either by rows or by elements.

An important feature of the package is that all input and output to disk is performed through a set of Fortran subroutines that manage a virtual memory system so that actual i/o occurs only when really necessary. Also important is to design in-core data structures that avoid expensive searches. All these aspects will be discussed.

At the time of writing, we are tuning the code for the positive-definite case and have performance figures for real problems. By the time of the seminar, we hope to have developed the indefinite case, too.

Subscribe to Rutherford Appleton Laboratory