The factorization allows the original problem

(B.8)

Ax = f

to be written without approximation as

(B.9)

Lg = f

(B.10)

Ux = g

APPENDIX B. DIRECT MATRIX-SOLVERS

518

The “backsolve” step is to compute x from these triangular matrix equations in O(2N 2 )

operations through the recursions (“backsolve” steps)

1

(B.11)

g1 = f1

l11

±

1

i’1

fi ’ (B.12)

gi = lij gj i = 2, 3, . . . , N

lii

j=1

(B.13)

xN = gN

N

xi = gi ’ i = (N ’ 1), (N ’ 2), . . . , 2 (B.14)

uij xj

j=i+1

Like the LU factorization, the backsolve step can also be greatly reduced in cost by exploit-

ing the sparsity of the matrix A: eliminating operations on elements known a priori to be

zero.

In many cases, a general skyline solver is inef¬cient because the matrix has very simple

patterns of sparsity. We describe three useful special cases of sparse matrices in the next

four sections.

B.2 Banded Matrix

De¬nition 48 (BANDWIDTH OF A MATRIX) A matrix A is said to be of BANDWIDTH

(2m + 1) if its elements satisfy the equation

for all |i ’ j| > m (B.15)

aij = 0

The special case m = 1 has at most three non-zero elements in each row (or column) and is called

“TRIDIAGONAL”. The special case m = 2 has at most ¬ve non-zero elements in each row and is

called “PENTADIAGONAL”.

Theorem 42 (LU DECOMPOSITION: BANDED MATRIX) The LU decomposition of a ma-

trix of bandwidth (2m + 1) has factors which are of the same bandwidth, (2m + 1), in addition to

being triangular.

The cost of performing an LU decomposition is

# multiplications = N (m2 + m) (B.16)

# additions = N m2 (B.17)

Eqs. (B.5) through (B.13) still apply; we merely truncate the sums to re¬‚ect the bandwidth of A,

L, and U.

A number of special cases are collected in Table B.2.

PROOF: The bandwidth preserving property can be shown directly from (B.5).

B.2. BANDED MATRIX 519

Table B.2: Operation Counts for Gaussian Elimination.

There are two steps in solving a matrix equation: (i) the LU factorization of the square

matrix A, which must be repeated whenever A is altered and (ii) the backsolve, which must

be repeated for each different column vector f . The costs for each step are shown separately

below because it is common (in semi-implicit time stepping algorithms, for instance) to

perform only a single LU factorization followed by many backsolves (one per time step).

Below, we list the costs for general m under the simplifying assumptions that (i) N m

where N is the dimension of the matrix and (ii) multiplications and divisions are equally

costly and additions and subtractions are equally costly. Warning: all entries assume piv-

oting is unnecessary.

(a) Full matrix

N 3 /3 mults. N 3 /3 adds.

LU factorization: &

N 2 mults. N 2 adds.

Backsolve: &

(b) Banded matrix of bandwidth m

N (m2 + m) mults. N m2 adds.

LU factorization: &

Backsolve: N (2m + 1) mults. & N (2m) adds.

(c) Special bandwidths.

The ¬rst number listed in each pair is the number of multiplications or divisions; the second

is the number of additions and subtractions.

Name LU total Backsolve LU +Backsolve

m

1 [Tridiagonal] 2N & N 3N & 2N 5 N mult. & 3 N add.

2 [Pentadiagonal] 6N & 4N 5N & 4N 11 N mult. & 8 N add.

3 [Heptadiagonal] 12N & 9N 7N & 6N 19 N mult. & 15 N add.

4 [Enneadiagonal] 20N & 16N 9N & 8N 29 N mult. & 24 N add.

5 [Hendecadiagonal] 30N & 25N 11N & 10N 41 N mult. & 35 N add.

6 [Tridecadiagonal] 42N & 36N 13N & 12N 55 N mult. & 48 N add.

APPENDIX B. DIRECT MATRIX-SOLVERS

520

B.3 Matrix-of-Matrices Theorem

Theorem 43 (MATRICES WHOSE ELEMENTS ARE MATRICES)

All the ordinary rules of matrix multiplication and factorization, including the LU decomposi-

tion for full and banded matrices, still apply when the elements of the matrix are themselves matrices,

provided one is careful about the non-commutativity of matrix multiplication.

PROOF: Matrix operations (addition, subtraction, multiplication, and division) obey ex-

actly the same rules as those for ordinary numbers except that division must be interpreted

as multiplication by the inverse matrix and multiplication and division are not commuta-

tive. It follows that formulas like (B.5) “ (B.13) will remain true when the aij , lij , and uij

are matrices so long as we do not reverse the order of any factors. The division by lii in

(B.11) was written as left multiplication by the inverse of lii precisely for this reason. Q. E.

D.

This matrix-of-matrices concept has many interesting applications. In the next two sec-

tions, we shall discuss a couple.

B.4 Block-Banded Elimination: the “Lindzen-Kuo” Algo-

rithm

When the usual second order ¬nite differences are applied to a second order boundary

value problem in two dimensions1 , the result is a matrix of dimension (M N ) — (M N ).

When the differential equation is everywhere elliptic and suf¬ciently well-behaved, the

matrix problem may be quite ef¬ciently solvable through Successive Over-Relaxation or

some other iterative method. However, the iterations usually diverge if the matrix is not

positive de¬nite, and this can happen even for a harmless Helmholtz equation like

(B.18)

u+Ku=f

for constant K where K is suf¬ciently large and positive. (However, multi-grid iterations

may work even for an inde¬nite matrix if Gaussian elimination is used on the coarsest

grid.)

An alternative is to apply Gaussian elimination (Lindzen and Kuo, 1969) in such a way

as to exploit the “block-banded” structure of the matrix. By “block-banded” we mean

that the matrix is tridiagonal if we group square blocks of dimension M — M together.

Alternatively, we can simply treat the “block-banded” matrix as an ordinary banded matrix

of bandwidth 3M , but it is conceptually simpler and algorithmically more ef¬cient to use

the block structure instead.

The history of the Lindzen-Kuo note is rather illuminating. Although the method is

often called the “Lindzen-Kuo” method in geophysics since most applications in that area

trace their ancestry back to their note, band LU factorization is a very obvious idea. They

themselves eventually found an earlier reference in Richtmyer (1957) but chose to write a

note anyway because most numerical analysis texts did not discuss Gaussian elimination

even in the context of one-dimensional boundary value problems, preferring “shooting”

instead.

1 Thesame is true in three dimensions, but we will discuss only two dimensions for simplicity ” and because

the matrices are usually too large for this method to be practical in three dimensions.

B.4. BLOCK-BANDED ELIMINATION: THE “LINDZEN-KUO” ALGORITHM 521

The reason for this is not that block-banded elimination doesn™t work, but rather that it

is dif¬cult to prove that it does. Lindzen and Kuo found empirically, however, that block-

banded elimination is very robust. It worked for “stiff” one-dimensional problems with

many alternating regions of exponential and oscillatory behavior, applied with as many

as 2000 grid points. It was equally successful in solving two-dimensional boundary value

problems of mixed elliptic-hyperbolic type. I have used the method with great success

myself, including cases in which one dimension was treated with ¬nite differences and the

other with Chebyshev polynomials.

Second order ¬nite differences give, for a one dimensional boundary value problem, an

ordinary tridiagonal matrix. For a two-dimensional BVP, these methods generate a block

tridiagonal matrix, that is to say, a matrix which is in the same form as a tridiagonal matrix

of dimension N except that each “element” is itself a matrix of dimension M where M is

the number of grid points in the second coordinate.

The beauty of the “matrix-of-matrices” concept is that it implies we can solve both one-

and two-dimensional problems by exactly the same algorithm. Since tridiagonal and block

tridiagonal matrices are so important in applications ” in particular, they are generated by

Chebyshev methods in a number of important special cases ” it is useful to write down

explicit formulas.

Since there are no more than three non-zero elements or blocks in each matrix, it is

helpful to denote each by a symbol with a single subscript. The LU factorization is

±1 0 0 0 1 δ1 0 0 a1 c1 0 0

b2 ±2 0 0 0 1 δ2 0 b a2 c2 0

=2 (B.19)

0 b3 ±3 0 0 0 1 δ3 0 b3 a3 c3

0 0 b4 ±4 0 0 0 1 0 0 b4 a4

for the 4—4 case. Note that the “subdiagonal” elements bi are identical with the “subdiag-

onal” of L, so we only have to compute two elements (or M — M matrices) for each of the

N rows of L and U.

The solution of the tridiagonal problem

(B.20)

Ax = f

is then

(B.21)

±1 = a1

g1 = (±1 )’1 f1 (B.22)

’1

i = 1, . . . , N ’ 1 (B.23)

δi = ±i ci

±i+1 = ai+1 ’ bi+1 δi (B.24)

gi+1 = (±i+1 )’1 [fi+1 ’ bi+1 gi ] (B.25)

and the backward recurrence

(B.26)

xN = gN

xj = gj ’ δj xj+1 j = N ’ 1, N ’ 2, . . . , 1

When this is applied to block tridiagonal matrices, we have a problem: the inverse of

a sparse matrix is generally a full matrix. Consequently, even if the block matrices ai , bi ,

APPENDIX B. DIRECT MATRIX-SOLVERS

522

and ci are themselves tridiagonal, as would be true with ¬nite differences in both direc-

tions, all the δi and all the ±i matrices except ±1 are full matrices. For this reason, a mixed

pseudospectral-¬nite difference algorithm is very appealing (Boyd, 1978c); we sacri¬ce noth-

ing by using collocation with M points since the matrices generated during Gaussian elim-

ination will be full anyway, regardless of whether ai , bi , and ci are sparse (¬nite difference)

or full (pseudospectral).

Above, we segregated the LU factorization on the left and the steps for the backsolve

on the right because if we are only going to compute a single solution for a given matrix

A, it is most ef¬cient to compute the back-solve intermediate, g, on the same forward pass

in which we convert the ±i . That way, we only need to store the δi matrices at a cost of

(N M 2 ) memory cells. This method is simpler even for a one dimensional problem since

the complete solution is computed with just two DO loops.

To compute many solutions with the same matrix A but different f ” the normal sit-

uation in an implicit time-stepping code ” then one would perform the steps on the left

only once, and solve for g and x in a separate subroutine. Unfortunately, this needs the LU

factors of the ±i in addition to the δi , requiring 2N M 2 storage.

The rate-determining step is solving ±i δi = ci , which requires Gaussian elimination on

a total of N matrices at a cost of O(N M 3 ). This is obviously much more expensive than

an optimized iterative method, which is O(N M log M ). Each back-solve requires O(N M 2 )

operations.

It goes without saying that block-elimination is a method of last resort; iterative meth-

ods are always much faster when they can be applied. However, geophysical wave equa-

tions are often of mixed elliptic-hyperbolic or lack the property of positive de¬niteness

even when everywhere elliptic. The great strength of the block LU algorithm is that it is

very robust. It always gives a solution even for those ghastly equations of mixed type2 . It is

very successful in one-dimensional problems where “shooting” or even “parallel shooting”

would fail.

Geophysical applications include Lindzen (1970), Lindzen & Hong (1974), Forbes and

Garrett (1976), and Schoeberl & Geller (1977).

B.5 Block and “Bordered” Matrices: the Faddeev-Faddeeva

Factorization

When Galerkin™s method with Chebyshev polynomials is applied to