. 63
( 136 .)


The three non-zero elements of the lower triangular matrix L are

bjl = Bjl ; djl = Djl

ejl = Ejl ’ bjl hj,l’1 ’ djl fj’1,l ’ ± {bjl fj,l’1 + djl hj’1,l } (15.42)

where ± = 0 for the ¬rst factorization, HLU , and ± = 1 for the second factorization, HRS .
The non-zero elements of the upper triangular matrix (aside from the 1™s on the diagonal)

fjl = Fjl /ejl ; hjl = Hjl /ejl

Note that we must march through the grid diagonally from top left to bottom right so that
we calculate the elements of U before we need them for the computation of the diagonal
elements of L.
What is striking about (15.41)“(15.43) is that there is no ¬ll-in: the approximate factor-
ization still gives only ¬ve non-trivial coef¬cients per grid point for L and U together. This
is in marked contrast to the exact LU decomposition of HF D , which can be computed via
the block tridiagonal algorithm: the N — N blocks of the factored L & U matrices are dense,
even though the blocks of HF D are sparse, so that there are roughly 5N non-trivial matrix
2 Unless the PDE is separable, in which case we should use the special spectral methods for this class of equa-
tions described in Sec. 15.11.

elements per grid point. Because the L and U matrices of the approximate factorizations
are so sparse, the necessary backsolves can be performed in O(N 2 ) operations ” that is to
say, the cost of inverting HLU or HRS is directly proportional to the total number of grid
points. This is the best that we could hope for, and implies that inverting H will not be the
rate-determining part of the iteration since evaluating (Λu) is always more expensive.
The remaining issue is: How much does incomplete factorization increase the number
of iterations? This is inversely proportional to the “condition number”, κ ≡ ±max /±min .
As shown above, κ ¤ 2.4 for a second order equation for ¬nite difference preconditioning.
Tables 15.1 and 15.2, show that κ is much larger for both approximations than for HF D , and
worse, κ also increases with N .

Table 15.1: Extreme eigenvalues for the two-dimensional Chebyshev discretization of
u = f . (From Zang et al., 1984)

’1 ’1 ’1
N ±min ±max ±min ±max ±min ±max
4 1.000 1.757 0.929 1.717 1.037 1.781
8 1.000 2.131 0.582 2.273 1.061 2.877
16 1.000 2.305 0.224 2.603 1.043 4.241
24 1.000 2.361 0.111 2.737 1.031 5.379

For calculations on a single grid, HRS is clearly superior to HLU . For N = 24, the asymp-
totic convergence rate for HRS is about 67% of that for the ¬nite difference preconditioning.
Because it is much cheaper per iteration, approximate factorization can drastically reduce
the cost for boundary value problems in two or more spatial dimensions.
Canuto et al. (1988) observe that these incomplete-LU decompositions are poor on vec-

Table 15.2: Condition Number κ for Preconditioned Chebyshev Operator in Two Dimen-
sions [κ ≡ ±max /±min ]

Single-Grid Multigrid
’1 ’1 ’1 ’1
4 1.85 1.72 ” ”
8 3.91 2.71 1.79 2.07
16 11.62 4.07 2.12 2.92
24 24.66 5.22 2.26 3.79

tor and parallel machines because they require a lot of recursive [sequential] calculations.
They suggest perhaps ADI-type factorizations might be better, but so far, little has been
Deville and Mund(1984, 1985, 1990, 1991, 1992) and Deville, Mund and Van Kemenade
(1994) and others have shown that ¬nite elements are also a very effective preconditioner
for spectral methods. One may use ¬nite elements on triangular elements as the precon-
ditioner even though the spectral approximation is de¬ned on a quadrilateral or a set of
quadrilateral subdomains. The condition number is even smaller than with ¬nite differ-
ence preconditioning. Furthermore, ef¬cient parallelization of ¬nite elements is a well-
researched and well-understood task.
We shall not discuss ¬nite element preconditioning in detail because (i) the principles
are exactly the same as for ¬nite difference preconditioning and (ii) a detailed treatment
would be too great a digression into ¬nite elements. This lack of space, however, should not
be interpreted as implying that ¬nite elements are in any way inferior to ¬nite differences
for preconditioning.

15.6 Raising the Order Through Preconditioning
In the previous two sections, ¬nite difference and ¬nite element methods have been hum-
ble servants of the spectral method. One can also interpret the same preconditioned algo-
rithm from a perspective in which the spectral method is subservient.
Suppose one has a code to solve a boundary value problem using a second order
method. How can one obtain high accuracy without increasing the number of grid points
to a ridiculous extreme? Or even compute a crude answer when the low order scheme
needs more degrees of freedom than will ¬t in the available memory?
The answer is to write one additional subroutine to evaluate the residual of the bound-
ary value problem using a high order method, such as a pseudospectral algorithm. If we set
up an iteration, repeatedly calling the low order BVP-solver with the output of the high or-
der residual as the inhomogeneous term in the boundary value problem, then the iteration
will converge to an answer of high order accuracy even though all but the residual-evaluator
uses only low order schemes. The overall procedure is identical with the preconditioned
spectral iteration described in the two previous sections.
In chess, the lowly pawn can be promoted to a queen, the most powerful piece, after
reaching the last row of the chessboard. In numerical analysis, preconditioned iterations
provide a way to promote a low order method to spectral accuracy. This promotion does
not require rewriting the low order code or negotiating a thicket of enemy chess pieces. It
merely needs a single subroutine that will take the grid point values of the unknown as
input, and return the residual as the output. The spectral algorithms can be completely
isolated and encapsulated in this single subroutine or module.

15.7 Multigrid: An Overview
The reason that an un-conditioned Richardson™ iteration is so expensive is that the total
number of iterations,
Niter = T /„
is large where „ is the time step and where T is the total integration time, that is, the time
required for the solution of the diffusion equation to relax to the steady state. The dif¬culty

is that different components of the ¬‚ow drive the numerator and denominator of (15.44) in
opposite directions. To see this, consider the forced diffusion equation

ut = uxx + f

with periodic boundary conditions, u(x + 2 π) = u(x).
The time step „ must be very small because the explicit, one-step Euler™s method is
unstable unless
„< 2
where kmax is the maximum x wavenumber. This constraint on „ is due to the high wavenum-
bers in the solution.
The time interval T is large because the low wavenumbers decay very slowly:

u = a1 (0) e’t cos(x) + b1 (0) e’t sin(x) + . . . (15.47)

so T > 5 is necessary for even two decimal places of accuracy. The total number of itera-
tions, T /„ , is enormous both because the numerator is large and because the denominator
is small.
Multigrid exploits the fact that these two problems ” large T , small „ ” come from
different parts of the wavenumber spectrum. Since the high wavenumber components
decay very rapidly, the error in the large k Fourier coef¬cients has disappeared in only a
fraction of the total integration interval. The multigrid principle is simple: throw them
away, and continue the integration with a smaller basis set and a larger time step.
The simplest choice is to halve the maximum wavenumber and quadruple the time step
at each stage because every grid point of the reduced grid is also a part of the original ¬ne
grid. So that each of the high wavenumber Fourier components will decay rapidly, we need
a time-step no larger than one-half the explicit stability limit. If we somewhat arbitrarily
choose to reduce the error in all Fourier components between kmax /2 and kmax by exp(’10),
then we must march 40 time steps with „ = „min , and we will repeat this pattern of taking
40 time steps at each stage. (Fig. 15.3.)
Let K denote the multigrid iteration level. Then

kmax = 2K [largest wavenumber at level K] (15.48)

Table 15.3: A Selected Bibliography of Spectral Multigrid

References Comments
Zang&Wong&Hussaini(1982,1984) Incomplete LU factorizations as smoothers
Brandt&Fulton&Taylor(1985) Periodic problems with meteorological applications
Streett&Zang&Hussaini(1985) Transonic potential ¬‚ow (aerodynamics)
Schaffer&Stenger(1986) sinc (Whittaker cardinal) basis
Phillips&Zang&Hussaini(1986) Preconditioners
Zang&Hussaini(1986) Time-dependent Navier-Stokes
Erlebacher&Zang&Hussaini(1988) turbulence modeling
Nosenchuck&Krist&Zang(1988) Parallel machine (“Navier-Stokes Computer”)
Heinrichs(1988a) Line relaxation
Heinrichs(1988b) Mixed ¬nite difference and Fourier methods
deVries&Zandbergen(1989) Biharmonic equation
Heinrichs(1992a) Stokes ¬‚ow in streamfunction formulation
Heinrichs(1993c) Navier-Stokes equation
Heinrichs(1993d) reformulated Stokes equations

Figure 15.3: Schematic of multigrid iteration for the one-dimensional diffusion equation
with periodic boundary conditions. After a few iterations with shortest time step, „min , the
coef¬cients of the upper half of the Fourier series have already converged to their exact
values. We therefore store coef¬cients 5, 6, 7, and 8 in the column vector which holds the
¬nal answer, halve the basis set, increase the time step by four, and resume iterating. Once
wavenumbers 3 and 4 have relaxed to their asymptotic values, we have the basis set and
increase the time step again, and so on.

is the largest wavenumber we keep on the K-th grid (and similarly for the y and z wavenum-

[time step at level K] (15.49)
„ (K) =
A single grid using „ = „min requires 64 — 40 = 2560 time steps, and each step would
cost O(Nf ) operations where Nf is the total number of grid points on the ¬nest grid. With
multigrid, we need only 160 time steps [level K = 0 does not ¬t on the diagram], and the
work is
Ntotal ∼ O 40 Nf operations (15.50)
1+ ++

since each level uses a grid with only half as many points as its predecessor. The sum
{1 + 1/2 + 1/4 + 1/8 + . . . } is the geometric series which converges to 2. The total cost of
removing the error from all wavenumbers is at worst a factor of two greater than the cost of
relaxing to the correct solution for just the upper half of the wavenumber spectrum. Even
with kmax as small as 8, the savings is a factor of 32. In principle, the multigrid iteration

is cheaper than an unpreconditioned Richardson™s iteration on a single grid by a factor of
O(N 2 ).
In more realistic problems, variable coef¬cients in the differential equation couple the
high and low wavenumbers together. This makes it necessary to add a reverse cascade
in K: after the solution has been computed on the coarsest grid, one must interpolate to
a ¬ner grid, apply a few more iterations and repeat. Multigrid subroutines have rather
complicated control structures so that the program can iterate a suf¬ciently large number
of times on each level to ensure a successful transition to the next. It is usually unnecessary
to take 40 steps on each level when both a forward & backward cascade is used; three
iterations per level is more typical, but several ¬ne-to-coarse-to-¬ne transitions are usual.
The pre-conditioned Richardson™s iteration shares with multigrid the property that the
quite senseless to apply multigrid to a one-dimensional problem. In two dimensions, the
high cost of the ¬nite difference preconditioning has inspired the approximate factoriza-
tions discussed in Sec. 15.5. Their condition number does increase with N , so multigrid
becomes increasingly superior as the number of grid points increases.
Note that multigrid is not a competitor to Richardson™s iteration or preconditioning.
Rather, multigrid is a strategy for accelerating Richardson™s iteration (with or without pre-
conditioning) or other iterations at the expense of additional programming and storage.
For further information, one can consult Zang, Wong, and Hussaini (1982, 1984), Brandt,
Fulton, and Taylor (1985), and Canuto et al. (1988). A very readable introduction to multi-
grid, although it discusses ¬nite differences only, is the tutorial by W. Briggs (1987).

15.8 The Minimum Residual Richardson™s (MRR) Method
This improvement of the conventional Richardson™s method, attributed to Y. S. Wong in
Zang, Wong, and Hussaini (1982), has the great virtue that it is a parameter-free algorithm
that does not require guessing the minimum and maximum eigenvalues of the iteration
matrix. Unlike the Chebyshev acceleration of Richardson™s method, the MMR algorithm
does not perform poorly when the matrix has complex eigenvalues. Indeed, the only
requirement for convergence of the Minimum Residual Richardson™s method is that the
eigenvalues of the (preconditioned) matrix lie strictly in the right half of the complex plane
(Eisenstat, Elman, and Schultz, 1983).
The basic step is identical with that of the preconditioned Richardson™s iteration for
A u = g:

un+1 = un + „ g (15.51)

Like the Chebyshev-accelerated Richardson™s method, the MMR algorithm varies „ to op-
timize the rate of convergence. However, it does not set „ to a ¬xed value that requires
knowledge of the eigenvalues of A. Instead, the MRR varies „ on the ¬‚y, making a new
choice at each iteration, so as to minimize the residual of the next iterate.


. 63
( 136 .)