. 106
( 136 .)


replacing the upper limits on the sums by the skyline function σ(i).
The factorization allows the original problem

Ax = f

to be written without approximation as

Lg = f

Ux = g

The “backsolve” step is to compute x from these triangular matrix equations in O(2N 2 )
operations through the recursions (“backsolve” steps)

g1 = f1
± 
1 
fi ’ (B.12)
gi = lij gj i = 2, 3, . . . , N
lii  

xN = gN
xi = gi ’ i = (N ’ 1), (N ’ 2), . . . , 2 (B.14)
uij xj

Like the LU factorization, the backsolve step can also be greatly reduced in cost by exploit-
ing the sparsity of the matrix A: eliminating operations on elements known a priori to be
In many cases, a general skyline solver is inef¬cient because the matrix has very simple
patterns of sparsity. We describe three useful special cases of sparse matrices in the next
four sections.

B.2 Banded Matrix
De¬nition 48 (BANDWIDTH OF A MATRIX) A matrix A is said to be of BANDWIDTH
(2m + 1) if its elements satisfy the equation

for all |i ’ j| > m (B.15)
aij = 0

The special case m = 1 has at most three non-zero elements in each row (or column) and is called
“TRIDIAGONAL”. The special case m = 2 has at most ¬ve non-zero elements in each row and is

Theorem 42 (LU DECOMPOSITION: BANDED MATRIX) The LU decomposition of a ma-
trix of bandwidth (2m + 1) has factors which are of the same bandwidth, (2m + 1), in addition to
being triangular.
The cost of performing an LU decomposition is

# multiplications = N (m2 + m) (B.16)
# additions = N m2 (B.17)

Eqs. (B.5) through (B.13) still apply; we merely truncate the sums to re¬‚ect the bandwidth of A,
L, and U.
A number of special cases are collected in Table B.2.

PROOF: The bandwidth preserving property can be shown directly from (B.5).

Table B.2: Operation Counts for Gaussian Elimination.
There are two steps in solving a matrix equation: (i) the LU factorization of the square
matrix A, which must be repeated whenever A is altered and (ii) the backsolve, which must
be repeated for each different column vector f . The costs for each step are shown separately
below because it is common (in semi-implicit time stepping algorithms, for instance) to
perform only a single LU factorization followed by many backsolves (one per time step).
Below, we list the costs for general m under the simplifying assumptions that (i) N m
where N is the dimension of the matrix and (ii) multiplications and divisions are equally
costly and additions and subtractions are equally costly. Warning: all entries assume piv-
oting is unnecessary.

(a) Full matrix

N 3 /3 mults. N 3 /3 adds.
LU factorization: &
N 2 mults. N 2 adds.
Backsolve: &

(b) Banded matrix of bandwidth m

N (m2 + m) mults. N m2 adds.
LU factorization: &
Backsolve: N (2m + 1) mults. & N (2m) adds.

(c) Special bandwidths.

The ¬rst number listed in each pair is the number of multiplications or divisions; the second
is the number of additions and subtractions.

Name LU total Backsolve LU +Backsolve

1 [Tridiagonal] 2N & N 3N & 2N 5 N mult. & 3 N add.
2 [Pentadiagonal] 6N & 4N 5N & 4N 11 N mult. & 8 N add.
3 [Heptadiagonal] 12N & 9N 7N & 6N 19 N mult. & 15 N add.
4 [Enneadiagonal] 20N & 16N 9N & 8N 29 N mult. & 24 N add.
5 [Hendecadiagonal] 30N & 25N 11N & 10N 41 N mult. & 35 N add.
6 [Tridecadiagonal] 42N & 36N 13N & 12N 55 N mult. & 48 N add.

B.3 Matrix-of-Matrices Theorem

All the ordinary rules of matrix multiplication and factorization, including the LU decomposi-
tion for full and banded matrices, still apply when the elements of the matrix are themselves matrices,
provided one is careful about the non-commutativity of matrix multiplication.

PROOF: Matrix operations (addition, subtraction, multiplication, and division) obey ex-
actly the same rules as those for ordinary numbers except that division must be interpreted
as multiplication by the inverse matrix and multiplication and division are not commuta-
tive. It follows that formulas like (B.5) “ (B.13) will remain true when the aij , lij , and uij
are matrices so long as we do not reverse the order of any factors. The division by lii in
(B.11) was written as left multiplication by the inverse of lii precisely for this reason. Q. E.
This matrix-of-matrices concept has many interesting applications. In the next two sec-
tions, we shall discuss a couple.

B.4 Block-Banded Elimination: the “Lindzen-Kuo” Algo-

When the usual second order ¬nite differences are applied to a second order boundary
value problem in two dimensions1 , the result is a matrix of dimension (M N ) — (M N ).
When the differential equation is everywhere elliptic and suf¬ciently well-behaved, the
matrix problem may be quite ef¬ciently solvable through Successive Over-Relaxation or
some other iterative method. However, the iterations usually diverge if the matrix is not
positive de¬nite, and this can happen even for a harmless Helmholtz equation like


for constant K where K is suf¬ciently large and positive. (However, multi-grid iterations
may work even for an inde¬nite matrix if Gaussian elimination is used on the coarsest
An alternative is to apply Gaussian elimination (Lindzen and Kuo, 1969) in such a way
as to exploit the “block-banded” structure of the matrix. By “block-banded” we mean
that the matrix is tridiagonal if we group square blocks of dimension M — M together.
Alternatively, we can simply treat the “block-banded” matrix as an ordinary banded matrix
of bandwidth 3M , but it is conceptually simpler and algorithmically more ef¬cient to use
the block structure instead.
The history of the Lindzen-Kuo note is rather illuminating. Although the method is
often called the “Lindzen-Kuo” method in geophysics since most applications in that area
trace their ancestry back to their note, band LU factorization is a very obvious idea. They
themselves eventually found an earlier reference in Richtmyer (1957) but chose to write a
note anyway because most numerical analysis texts did not discuss Gaussian elimination
even in the context of one-dimensional boundary value problems, preferring “shooting”
1 Thesame is true in three dimensions, but we will discuss only two dimensions for simplicity ” and because
the matrices are usually too large for this method to be practical in three dimensions.

The reason for this is not that block-banded elimination doesn™t work, but rather that it
is dif¬cult to prove that it does. Lindzen and Kuo found empirically, however, that block-
banded elimination is very robust. It worked for “stiff” one-dimensional problems with
many alternating regions of exponential and oscillatory behavior, applied with as many
as 2000 grid points. It was equally successful in solving two-dimensional boundary value
problems of mixed elliptic-hyperbolic type. I have used the method with great success
myself, including cases in which one dimension was treated with ¬nite differences and the
other with Chebyshev polynomials.
Second order ¬nite differences give, for a one dimensional boundary value problem, an
ordinary tridiagonal matrix. For a two-dimensional BVP, these methods generate a block
tridiagonal matrix, that is to say, a matrix which is in the same form as a tridiagonal matrix
of dimension N except that each “element” is itself a matrix of dimension M where M is
the number of grid points in the second coordinate.
The beauty of the “matrix-of-matrices” concept is that it implies we can solve both one-
and two-dimensional problems by exactly the same algorithm. Since tridiagonal and block
tridiagonal matrices are so important in applications ” in particular, they are generated by
Chebyshev methods in a number of important special cases ” it is useful to write down
explicit formulas.
Since there are no more than three non-zero elements or blocks in each matrix, it is
helpful to denote each by a symbol with a single subscript. The LU factorization is

±1 0 0 0 1 δ1 0 0 a1 c1 0 0
b2 ±2 0 0 0 1 δ2 0 b a2 c2 0
=2 (B.19)
0 b3 ±3 0 0 0 1 δ3 0 b3 a3 c3
0 0 b4 ±4 0 0 0 1 0 0 b4 a4

for the 4—4 case. Note that the “subdiagonal” elements bi are identical with the “subdiag-
onal” of L, so we only have to compute two elements (or M — M matrices) for each of the
N rows of L and U.
The solution of the tridiagonal problem

Ax = f

is then

±1 = a1
g1 = (±1 )’1 f1 (B.22)
i = 1, . . . , N ’ 1 (B.23)
δi = ±i ci
±i+1 = ai+1 ’ bi+1 δi (B.24)
gi+1 = (±i+1 )’1 [fi+1 ’ bi+1 gi ] (B.25)

and the backward recurrence

xN = gN
xj = gj ’ δj xj+1 j = N ’ 1, N ’ 2, . . . , 1

When this is applied to block tridiagonal matrices, we have a problem: the inverse of
a sparse matrix is generally a full matrix. Consequently, even if the block matrices ai , bi ,

and ci are themselves tridiagonal, as would be true with ¬nite differences in both direc-
tions, all the δi and all the ±i matrices except ±1 are full matrices. For this reason, a mixed
pseudospectral-¬nite difference algorithm is very appealing (Boyd, 1978c); we sacri¬ce noth-
ing by using collocation with M points since the matrices generated during Gaussian elim-
ination will be full anyway, regardless of whether ai , bi , and ci are sparse (¬nite difference)
or full (pseudospectral).
Above, we segregated the LU factorization on the left and the steps for the backsolve
on the right because if we are only going to compute a single solution for a given matrix
A, it is most ef¬cient to compute the back-solve intermediate, g, on the same forward pass
in which we convert the ±i . That way, we only need to store the δi matrices at a cost of
(N M 2 ) memory cells. This method is simpler even for a one dimensional problem since
the complete solution is computed with just two DO loops.
To compute many solutions with the same matrix A but different f ” the normal sit-
uation in an implicit time-stepping code ” then one would perform the steps on the left
only once, and solve for g and x in a separate subroutine. Unfortunately, this needs the LU
factors of the ±i in addition to the δi , requiring 2N M 2 storage.
The rate-determining step is solving ±i δi = ci , which requires Gaussian elimination on
a total of N matrices at a cost of O(N M 3 ). This is obviously much more expensive than
an optimized iterative method, which is O(N M log M ). Each back-solve requires O(N M 2 )
It goes without saying that block-elimination is a method of last resort; iterative meth-
ods are always much faster when they can be applied. However, geophysical wave equa-
tions are often of mixed elliptic-hyperbolic or lack the property of positive de¬niteness
even when everywhere elliptic. The great strength of the block LU algorithm is that it is
very robust. It always gives a solution even for those ghastly equations of mixed type2 . It is
very successful in one-dimensional problems where “shooting” or even “parallel shooting”
would fail.
Geophysical applications include Lindzen (1970), Lindzen & Hong (1974), Forbes and
Garrett (1976), and Schoeberl & Geller (1977).

B.5 Block and “Bordered” Matrices: the Faddeev-Faddeeva
When Galerkin™s method with Chebyshev polynomials is applied to


. 106
( 136 .)