# Past Numerical Analysis Group Internal Seminar

It is well known that piecewise smooth signals are approximately sparse in a wavelet basis. However, other sparse representations are possible, such as the discrete gradient basis. It turns out that signals drawn from a random piecewise constant model have sparser representations in the discrete gradient basis than in Haar wavelets (with high probability). I will talk about this result and its implications, and also show some numerical experiments in which the use of the gradient basis improves compressive signal reconstruction.

One general approach to random number generation is to take a uniformly distributed (0,1) random variable and then invert the cumulative distribution function (CDF) to generate samples from another distribution. This talk follows this approach, approximating the inverse CDF for the Poisson distribution in a way which is particularly efficient for vector execution on NVIDIA GPUs.

Rational Krylov subspaces have been proven to be useful for many applications, like the approximation of matrix functions or the solution of matrix equations. It will be shown that extended and rational Krylov subspaces —under some assumptions— can be retrieved without any explicit inversion or system solves involved. Instead we do the necessary computations of $A^{-1} v$ in an implicit way using the information from an enlarged standard Krylov subspace.

It is well-known that both for classical and extended Krylov spaces, direct unitary similarity transformations exist providing us the matrix of recurrences. In practice, however, for large dimensions computing time is saved by making use of iterative procedures to gradually gather the recurrences in a matrix. Unfortunately, for extended Krylov spaces one is required to frequently solve, in some way or another a system of equations. In this talk both techniques will be integrated. We start with an orthogonal basis of a standard Krylov subspace of dimension $m+m+p$. Then we will apply a unitary similarity built by rotations compressing thereby significantly the initial subspace and resulting in an orthogonal basis approximately spanning an extended or rational Krylov subspace.

Numerical experiments support our claims that this approximation is very good if the large Krylov subspace contains $A^{-(m+1)} v$, …, $A^{-1} v$ and thus can culminate in nonneglectable dimensionality reduction and as such also can lead to time savings when approximating, e.g., matrix functions.

A stable algorithm to compute the roots of polynomials is presented. The roots are found by computing the eigenvalues of the associated companion matrix by Francis's implicitly-shifted $QR$ algorithm. A companion matrix is an upper Hessenberg matrix that is unitary-plus-rank-one, that is, it is the sum of a unitary matrix and a rank-one matrix. These properties are preserved by iterations of Francis's algorithm, and it is these properties that are exploited here. The matrix is represented as a product of $3n-1$ Givens rotators plus the rank-one part, so only $O(n)$ storage space is required. In fact, the information about the rank-one part is also encoded in the rotators, so it is not necessary to store the rank-one part explicitly. Francis's algorithm implemented on this representation requires only $O(n)$ flops per iteration and thus $O(n^{2})$ flops overall. The algorithm is described, backward stability is proved under certain conditions on the polynomial coefficients, and an extensive set of numerical experiments is presented. The algorithm is shown to be about as accurate as the (slow) Francis $QR$ algorithm applied to the companion matrix without exploiting the structure. It is faster than other fast methods that have been proposed, and its accuracy is comparable or better.

Traditionally threshold partial pivoting is used to ensure stability of sparse LDL^T factorizations of symmetric matrices. This involves comparing a candidate pivot with all entries in its row/column to ensure that growth in the size of the factors is limited by a threshold at each stage of the factorization. It is capabale of delivering a scaled backwards error on the level of machine precision for practically all real world matrices. However it has significant flaws when used in a massively parallel setting, such as on a GPU or modern supercomputer. It requires all entries of the column to be up-to-date and requires an all-to-all communication for every column. The latter requirement can be performance limiting as the factorization cannot proceed faster than k*(communication latency), where k is the length of the longest path in the sparse elimination tree.

We introduce a new family of communication-avoiding pivoting techniques that reduce the number of messages required by a constant factor allowing the communication cost to be more effectively hidden by computation. We exhibit two members of this family. The first deliver equivalent stability to threshold partial pivoting, but is more pessimistic, leading to additional fill in the factors. The second provides similar fill levels as traditional techniques and, whilst demonstrably unstable for pathological cases, is able to deliver machine accuracy on even the worst real world examples.

The relationship of diagonal dominance ideas to the convergence of stationary iterations is well known. There are a multitude of situations in which such considerations can be used to guarantee convergence when the splitting matrix (the preconditioner) is positive definite. In this talk we will describe and prove sufficient conditions for convergence of a stationary iteration based on a splitting with an indefinite preconditioner. Simple examples covered by this theory coming from Optimization and Economics will be described.

This is joint work with Michael Ferris and Tom Rutherford

This brief lecture will highlight several best-practice observations and

research for writing software for mathematical research, drawn from a

number of sources, including; Best Practices for Scientific Computing

[BestPractices], Code Complete [CodeComplete], and personal observation

from the presenter. Specific focus will be given to providing the

definition of important concepts, then describing how to apply them

successfully in day-to-day research settings. Following the outline from

Best Practices, we will cover the following topics:

* Write Programs for People, Not Computers

* Let the Computer Do the Work

* Make Incremental Changes

* Don't Repeat Yourself (or Others)

* Plan for Mistakes

* Optimize Software Only after It Works Correctly

* Document Design and Purpose, Not Mechanics

* Collaborate

[BestPractices]

http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001745

[CodeComplete] http://www.cc2e.com/Default.aspx