. 61
( 136 .)


(1990) have generalized this family of methods. Interpolating schemes are better for well-
resolved ¬‚ows, but non-interpolating SL may be very valuable in integrating climate and
ocean models at coarse resolution without excessive computational damping.
Williamson and Rasch(1989), Rasch Williamson(1990a), and Bermejo and Staniforth
14.13. SUMMARY 289

(1992) have studied “shape-preserving” or “quasi-monotone” algorithms. The key idea
is to replace ordinary interpolation by more exotic near-interpolation schemes that are con-
strained to preserve essential properties of the ¬‚ow such as smoothness and, for densities,
Priestley (1993) and Gravel and Staniforth (1994) have generalized SL to be quasi-
conservative of mass. The slight adjustments to conserve the total ¬‚uid mass are made
preferentially near the fronts where the Gibbs™ oscillations, which cause most of the mass
loss or gain, are concentrated.
Bates and McDonald (1982), Bates (1984), Bates et al.(1990), McDonald (1984, 1986, 1987)
and McDonald and Bates (1987, 1989) have developed ¬nite difference SL models. Bates
et al.(1990) show that the “split-explicit” and other time-splitting schemes are not nearly
as accurate an unsplit semi-implicit algorithm for the long time steps used with SL codes.
Environment Canada has developed ¬nite element SL models for forecasting (Cot´ , Gravel
and Staniforth, 1990).
One major dif¬culty is that ¬‚ow over mountains can trigger spurious, unphysical insta-
bility in SL/SI schemes. After much struggle, ways to ¬x up SL to cope with topographic
instability have been found as explained in Cot´ , Gravel and Staniforth (1995).
SL schemes have also been independently invented under other names. “Particle-in-
cell” schemes and their close connection with SL have been discussed by Bermejo (1990),
who proves that a common particle-in-cell method is just SL with cubic spline off-grid
The family of methods known variously as the “characteristic Galerkin” or “Lagrange-
Galerkin” or “Eulerian-Lagrangian” was independently invented outside of meteorology
by Douglas and Russell(1982), Morton(1985) and Pironneau(1982). It has been greatly de-
veloped in the ¬nite element world.

14.13 Summary
Semi-Lagrangian methods have been a big hit in meteorology, and under other names,
have had some success in engineering ¬‚uid mechanics, too. Most of the thorny problems
have been solved but some issues remain.
First, can the ¬xers which stabilize SL against topographic instabilities be simpli¬ed
and improved? Second, can high order off-grid interpolation improve, both in speed and
in conservation and shape-preserving properties, so that consistent spectral SL algorithms
become feasible for large models like weather forecasting codes? Third, can better and
simpler high-order-in-time SL methods be devised? Going beyond second order in time
seems to exact a heavy price.
Chapter 15

Iterative & Direct Methods for
Solving Matrix Equations

“Practice in numerical methods is the only way of learning it”
” H. Jeffreys and B. Jeffreys

“A good scientist is one who knows enough to steal from the best people”
” J. R. Holton

15.1 Introduction
For all but the simplest problems ” those with coef¬cients which are polynomials ”
Galerkin™s method and the pseudospectral method lead to full matrices, unlike the sparse
systems of equations generated by ¬nite difference and ¬nite element algorithms. This
is particularly unfortunate since implicit and semi-implicit marching procedures require
solving a boundary value problem at each time step. Since the expense of Gaussian elimi-
nation grows as O(N 3 ) and N is O(1000) even in two dimensions with modest resolution,
spectral methods are prohibitively expensive for most problems without highly ef¬cient
iterative methods for solving the ensuing matrix equations.
For special problems, it is possible to replace iterations with direct methods. The Haidvogel-
Zang algorithm for separable partial differential equations is the theme of one section. When
the equation is constant coef¬cient, one may sometimes obtain a sparse matrix by applying
Galerkin™s method as explained in another.
In most of the chapter, we will implicitly assume that the matrix is positive de¬nite, that
is, all its eigenvalues are of the same sign. The reason is that most simple iterations fail
when this condition is not met. Fortunately, both preconditioning and multigrid usually
eliminate the divergence caused by a few eigenvalues of the wrong sign. We explain why
and also describe a dif¬cult case where none of our iterations can be expected to work.
Fortunately, such circumstances are very rare, and there are remedies even for problems
with an in¬nite number of both positive and negative eigenvalues.
Lastly, we describe how these iterations can be directly extended to nonlinear prob-
lems. Two key themes that dominate earlier sections of the chapter, preconditioning and
iteration-as-marching-in-pseudotime, are important for nonlinear iterations, too.


15.2 Iteration-as-Diffusion, Stationary One-Step Iterations
& the Richardson/Euler Iteration
If the operator L is positive de¬nite, i. e. if all its eigenvalues are positive, then one way to
solve the boundary value problem

Lu = f

is to integrate the time-dependent equation

ut = ’L u + f (15.2)

with the same boundary conditions and an arbitrary initial condition. Eq. (15.2) is a gener-
alized diffusion equation. Some iterative methods are based directly on (15.2) and others
are not, but the analogy of iteration-is-diffusion is very useful because all iterative meth-
ods “diffuse” the error until the solution settles down into the steady-state equilibrium
described by (15.1).
Because L is positive de¬nite, all eigenmodes of L must decay with time. If L has one
or more negative eigenvalues, or a complex eigenvalue with a negative real part, then the
corresponding mode will amplify ” in words, (15.2) is unstable. Consequently, differential
equations with eigenvalues of both signs require special treatment, and we will assume L
is positive de¬nite in this section and the next.
Eq. (15.2) seems to have made the problem more complex rather than simpler, but for
the generalized diffusion equation, an explicit scheme costs only O(N ) operations per time
step. In contrast, solving the discretized boundary value problem through Gaussian elimi-
nation costs O(N 3 ) operations where N is the number of basis functions or grid points.
The simplest iterative method for (15.1) is to solve (15.2) by the ¬rst order Euler method,

un+1 = (I ’ „ Λ) un + „ f [Richardson/Euler/Jacobi] (15.3)

where Λ is the matrix which is the pseudospectral discretization of the differential operator
L. This was ¬rst suggested by L. F. Richardson (1910). The parameter „ is the time step;
since we are only interested in the steady state solution, the best choice for „ is only a little
smaller than the time-stepping stability limit.
The Jacobi iteration, also described in many textbooks, is identical with Richardson™s if
the diagonal matrix elements of Λ are rescaled to unity. Thus, the Jacobi and Richardson
iterations are conceptually identical. In this rest of the chapter, we omit the rescaling and
label this iteration with the name of Richardson.
Richardson™s iteration is the prototype for a whole family of methods known as “sta-
tionary one-step” iterations, which have the general form

un+1 = G un + k [Stationary, One-Step] (15.4)

where G is a square matrix and k is a column vector. The iteration is quite useless unless it
reduces to an identity when un is the exact solution of Λ u = f , so we have the constraint
’1 ’1
G=I’R (15.5)
Λ ; k=R f

for some square matrix R, but this still allows great diversity.

Until the early 70™s, the Gauss-Seidel iteration and its re¬nement, Successive Over-
Relaxation (SOR), were the stars. Unfortunately, they update matrix elements by using
a mixture of both old values and new ones already calculated on the same time level. Nei-
ther a vector processor nor the indirect pseudospectral algorithm (Sec. 4) can cope with
this mix, so we shall not discuss the Gauss-Seidel and SOR methods further.
In order to estimate the optimum „ and also to provide the framework for the general-
izations of Richardson™s method discussed later, we now turn to a bit of theory.
Matrix functions can always be evaluated, at least in principle, by using an expansion
in terms of the eigenfunctions of the matrix. If φn denotes the eigenvectors of the square
matrix G with »n as the corresponding eigenvalues, then successive iterates of a stationary
one-step method will be related by

»j un φj + k (15.6)
u = j

where the un are the coef¬cients of the expansion of un in the eigenvectors of G. De¬ne the
error at the n-th iteration as

e(n) ≡ u ’ un [Error at n-th step] (15.7)

Substituting this into (15.4) and recalling Λ u = f gives

e(n+1) ≡ G e(n) (15.8)

Let ρ denote the magnitude of the largest eigenvalue of the iteration matrix G, which is
called the “spectral radius” of the matrix and let φbiggest denote the corresponding eigen-
vector. For suf¬ciently large iteration number, the faster-decaying components of the error
will be negligible and

|e(n) | ∼ ρ |e(n’1) | as n ’ ∞ (15.9)

with en ∼ cφbiggest for some constant c. Thus, the rate of convergence is always geometric.
Optimizing an iteration is therefore equivalent to minimizing the spectral radius of the
iteration matrix. Let »min and »max denote the smallest and largest eigenvalues of Λ, which
for simplicity we assume are real. Then for Richardson™s iteration, the corresponding eigen-
values γmin and γmax of the iteration matrix G are

G ≡ I ’ „Λ γmin = 1 ’ „ »max γmax = 1 ’ „ »min (15.10)
=’ &

Fig. 15.1 shows that the spectral radius is minimized when |γmin | = γmax , which gives

[Optimum „ ; Richardson™s] (15.11)
»min + »max

»max ’ »min
[Smallest Spectral Radius] (15.12)
ρ(G) =
»max + »min
≈ 1’2 »min »max

Figure 15.1: The relationship between the eigenvalues of the matrix Λ and those of the
Richardson™s iteration matrix, G ≡ I ’ „ Λ where I is the identity matrix and „ is the
“time step”. (a) The eigenvalues of Λ lie on the positive interval » ∈ [»min , »max ]. (b) The
eigenvalues of G lie on » ∈ [(1 ’ „ »max ), (1 ’ „ »min )]

One dif¬culty is that we normally do not know either »min or »max . However, we shall
see later that it is not too hard to estimate them, especially when using a preconditioner.
The other, far more serious problem is that typically,

∼ O(N 2 ) »min [2d order equations] (15.13a)
∼ O(N k ) »min [k-th order equations] (15.13b)

independent of the number of spatial dimensions. This implies that for the commonest
case of a 2d order equation, log(ρ) ∼ O(1/N 2 ), which in turn means that O(N 2 ) iterations
are needed to obtain even moderate accuracy. For a fourth order equation, the number of
iterations increases to O(N 4 ).
We will show in a later section that with heavy use of the FFT, it is possible to evaluate
each Richardson iteration in O(N 2 log N ) operations in two dimensions, so the total cost
for a second order BVP is O(N 4 log N ) operations (with a large proportionality constant).
Fortunately, it is possible to tremendously improve the ef¬ciency of the Richardson
iteration through three modi¬cations: (i) Chebyshev acceleration (ii) preconditioning and
(iii) multigrid. The ¬rst was discussed in the ¬rst edition of this book; the second and third
are described here. All three can be used either individually or in combination to accelerate
the rate of convergence of iteration schemes.

15.3 Preconditioning: Finite Difference
The reason that Richardson™s iteration converges with such lamentable slowness is that

»max /»min 1

preconditioning is a strategy for replacing Λ in the iteration by a new matrix
A≡H (15.15)

where the “preconditioning matrix” H is chosen to meet two criteria:

1. H must be easily invertible

2. The ratio of the largest eigenvalue of A to the smallest must be much smaller than the
ratio for Λ:

±max /±min »max /»min

From the viewpoint of the second criterion alone, the ultimate preconditioning matrix
is Λ itself since A = I, the identity matrix, whose eigenvalues are all one. Such a choice is
obviously insane because if we could easily compute Λ , we would not need the iteration
in the ¬rst place! Still, it is suggestive: What we want is a matrix that is much simpler than
Λ, but in some sense approximates it.
The preconditioning strategy of Orszag (1980) is to take H to be the matrix of the usual
second order ¬nite difference approximation to the differential operator L for which Λ is the
pseudospectral approximation1 . It is obvious that inverting the “¬nite difference precondi-
tioning” is cheap. For a second order differential equation in one dimension, for example,
H is an ordinary tridiagonal matrix which can be inverted in O(8N ) operations. Thus, the
rate-determining step is not computing the inverse of H, but rather multipying u(n) by


. 61
( 136 .)