(15.41)

bjl = Bjl ; djl = Djl

ejl = Ejl ’ bjl hj,l’1 ’ djl fj’1,l ’ ± {bjl fj,l’1 + djl hj’1,l } (15.42)

where ± = 0 for the ¬rst factorization, HLU , and ± = 1 for the second factorization, HRS .

The non-zero elements of the upper triangular matrix (aside from the 1™s on the diagonal)

are

(15.43)

fjl = Fjl /ejl ; hjl = Hjl /ejl

Note that we must march through the grid diagonally from top left to bottom right so that

we calculate the elements of U before we need them for the computation of the diagonal

elements of L.

What is striking about (15.41)“(15.43) is that there is no ¬ll-in: the approximate factor-

ization still gives only ¬ve non-trivial coef¬cients per grid point for L and U together. This

is in marked contrast to the exact LU decomposition of HF D , which can be computed via

the block tridiagonal algorithm: the N — N blocks of the factored L & U matrices are dense,

even though the blocks of HF D are sparse, so that there are roughly 5N non-trivial matrix

2 Unless the PDE is separable, in which case we should use the special spectral methods for this class of equa-

tions described in Sec. 15.11.

CHAPTER 15. MATRIX-SOLVING METHODS

300

elements per grid point. Because the L and U matrices of the approximate factorizations

are so sparse, the necessary backsolves can be performed in O(N 2 ) operations ” that is to

say, the cost of inverting HLU or HRS is directly proportional to the total number of grid

points. This is the best that we could hope for, and implies that inverting H will not be the

rate-determining part of the iteration since evaluating (Λu) is always more expensive.

The remaining issue is: How much does incomplete factorization increase the number

of iterations? This is inversely proportional to the “condition number”, κ ≡ ±max /±min .

As shown above, κ ¤ 2.4 for a second order equation for ¬nite difference preconditioning.

Tables 15.1 and 15.2, show that κ is much larger for both approximations than for HF D , and

worse, κ also increases with N .

Table 15.1: Extreme eigenvalues for the two-dimensional Chebyshev discretization of

2

u = f . (From Zang et al., 1984)

’1 ’1 ’1

HF D HLU HRS

N ±min ±max ±min ±max ±min ±max

4 1.000 1.757 0.929 1.717 1.037 1.781

8 1.000 2.131 0.582 2.273 1.061 2.877

16 1.000 2.305 0.224 2.603 1.043 4.241

24 1.000 2.361 0.111 2.737 1.031 5.379

For calculations on a single grid, HRS is clearly superior to HLU . For N = 24, the asymp-

totic convergence rate for HRS is about 67% of that for the ¬nite difference preconditioning.

Because it is much cheaper per iteration, approximate factorization can drastically reduce

the cost for boundary value problems in two or more spatial dimensions.

Canuto et al. (1988) observe that these incomplete-LU decompositions are poor on vec-

Table 15.2: Condition Number κ for Preconditioned Chebyshev Operator in Two Dimen-

sions [κ ≡ ±max /±min ]

Single-Grid Multigrid

’1 ’1 ’1 ’1

N HLU L HRS L HLU L HRS L

4 1.85 1.72 ” ”

8 3.91 2.71 1.79 2.07

16 11.62 4.07 2.12 2.92

24 24.66 5.22 2.26 3.79

15.6. RAISING THE ORDER THROUGH PRECONDITIONING 301

tor and parallel machines because they require a lot of recursive [sequential] calculations.

They suggest perhaps ADI-type factorizations might be better, but so far, little has been

done.

Deville and Mund(1984, 1985, 1990, 1991, 1992) and Deville, Mund and Van Kemenade

(1994) and others have shown that ¬nite elements are also a very effective preconditioner

for spectral methods. One may use ¬nite elements on triangular elements as the precon-

ditioner even though the spectral approximation is de¬ned on a quadrilateral or a set of

quadrilateral subdomains. The condition number is even smaller than with ¬nite differ-

ence preconditioning. Furthermore, ef¬cient parallelization of ¬nite elements is a well-

researched and well-understood task.

We shall not discuss ¬nite element preconditioning in detail because (i) the principles

are exactly the same as for ¬nite difference preconditioning and (ii) a detailed treatment

would be too great a digression into ¬nite elements. This lack of space, however, should not

be interpreted as implying that ¬nite elements are in any way inferior to ¬nite differences

for preconditioning.

15.6 Raising the Order Through Preconditioning

In the previous two sections, ¬nite difference and ¬nite element methods have been hum-

ble servants of the spectral method. One can also interpret the same preconditioned algo-

rithm from a perspective in which the spectral method is subservient.

Suppose one has a code to solve a boundary value problem using a second order

method. How can one obtain high accuracy without increasing the number of grid points

to a ridiculous extreme? Or even compute a crude answer when the low order scheme

needs more degrees of freedom than will ¬t in the available memory?

The answer is to write one additional subroutine to evaluate the residual of the bound-

ary value problem using a high order method, such as a pseudospectral algorithm. If we set

up an iteration, repeatedly calling the low order BVP-solver with the output of the high or-

der residual as the inhomogeneous term in the boundary value problem, then the iteration

will converge to an answer of high order accuracy even though all but the residual-evaluator

uses only low order schemes. The overall procedure is identical with the preconditioned

spectral iteration described in the two previous sections.

In chess, the lowly pawn can be promoted to a queen, the most powerful piece, after

reaching the last row of the chessboard. In numerical analysis, preconditioned iterations

provide a way to promote a low order method to spectral accuracy. This promotion does

not require rewriting the low order code or negotiating a thicket of enemy chess pieces. It

merely needs a single subroutine that will take the grid point values of the unknown as

input, and return the residual as the output. The spectral algorithms can be completely

isolated and encapsulated in this single subroutine or module.

From this perspective, EVERY LOW ORDER BVP-SOLVER CONTAINS A SPECTRAL SOLVER

WITHIN IT.

15.7 Multigrid: An Overview

The reason that an un-conditioned Richardson™ iteration is so expensive is that the total

number of iterations,

(15.44)

Niter = T /„

is large where „ is the time step and where T is the total integration time, that is, the time

required for the solution of the diffusion equation to relax to the steady state. The dif¬culty

CHAPTER 15. MATRIX-SOLVING METHODS

302

is that different components of the ¬‚ow drive the numerator and denominator of (15.44) in

opposite directions. To see this, consider the forced diffusion equation

(15.45)

ut = uxx + f

with periodic boundary conditions, u(x + 2 π) = u(x).

The time step „ must be very small because the explicit, one-step Euler™s method is

unstable unless

2

(15.46)

„< 2

kmax

where kmax is the maximum x wavenumber. This constraint on „ is due to the high wavenum-

bers in the solution.

The time interval T is large because the low wavenumbers decay very slowly:

u = a1 (0) e’t cos(x) + b1 (0) e’t sin(x) + . . . (15.47)

so T > 5 is necessary for even two decimal places of accuracy. The total number of itera-

tions, T /„ , is enormous both because the numerator is large and because the denominator

is small.

Multigrid exploits the fact that these two problems ” large T , small „ ” come from

different parts of the wavenumber spectrum. Since the high wavenumber components

decay very rapidly, the error in the large k Fourier coef¬cients has disappeared in only a

fraction of the total integration interval. The multigrid principle is simple: throw them

away, and continue the integration with a smaller basis set and a larger time step.

The simplest choice is to halve the maximum wavenumber and quadruple the time step

at each stage because every grid point of the reduced grid is also a part of the original ¬ne

grid. So that each of the high wavenumber Fourier components will decay rapidly, we need

a time-step no larger than one-half the explicit stability limit. If we somewhat arbitrarily

choose to reduce the error in all Fourier components between kmax /2 and kmax by exp(’10),

then we must march 40 time steps with „ = „min , and we will repeat this pattern of taking

40 time steps at each stage. (Fig. 15.3.)

Let K denote the multigrid iteration level. Then

kmax = 2K [largest wavenumber at level K] (15.48)

Table 15.3: A Selected Bibliography of Spectral Multigrid

References Comments

Zang&Wong&Hussaini(1982,1984) Incomplete LU factorizations as smoothers

Brandt&Fulton&Taylor(1985) Periodic problems with meteorological applications

Streett&Zang&Hussaini(1985) Transonic potential ¬‚ow (aerodynamics)

Schaffer&Stenger(1986) sinc (Whittaker cardinal) basis

Phillips&Zang&Hussaini(1986) Preconditioners

Zang&Hussaini(1986) Time-dependent Navier-Stokes

Erlebacher&Zang&Hussaini(1988) turbulence modeling

Nosenchuck&Krist&Zang(1988) Parallel machine (“Navier-Stokes Computer”)

Heinrichs(1988a) Line relaxation

Heinrichs(1988b) Mixed ¬nite difference and Fourier methods

deVries&Zandbergen(1989) Biharmonic equation

Heinrichs(1992a) Stokes ¬‚ow in streamfunction formulation

Heinrichs(1993c) Navier-Stokes equation

Heinrichs(1993d) reformulated Stokes equations

15.7. MULTIGRID: AN OVERVIEW 303

Figure 15.3: Schematic of multigrid iteration for the one-dimensional diffusion equation

with periodic boundary conditions. After a few iterations with shortest time step, „min , the

coef¬cients of the upper half of the Fourier series have already converged to their exact

values. We therefore store coef¬cients 5, 6, 7, and 8 in the column vector which holds the

¬nal answer, halve the basis set, increase the time step by four, and resume iterating. Once

wavenumbers 3 and 4 have relaxed to their asymptotic values, we have the basis set and

increase the time step again, and so on.

is the largest wavenumber we keep on the K-th grid (and similarly for the y and z wavenum-

bers),

1

[time step at level K] (15.49)

„ (K) =

22K

A single grid using „ = „min requires 64 — 40 = 2560 time steps, and each step would

cost O(Nf ) operations where Nf is the total number of grid points on the ¬nest grid. With

multigrid, we need only 160 time steps [level K = 0 does not ¬t on the diagram], and the

work is

111

Ntotal ∼ O 40 Nf operations (15.50)

1+ ++

248

since each level uses a grid with only half as many points as its predecessor. The sum

{1 + 1/2 + 1/4 + 1/8 + . . . } is the geometric series which converges to 2. The total cost of

removing the error from all wavenumbers is at worst a factor of two greater than the cost of

relaxing to the correct solution for just the upper half of the wavenumber spectrum. Even

with kmax as small as 8, the savings is a factor of 32. In principle, the multigrid iteration

CHAPTER 15. MATRIX-SOLVING METHODS

304

is cheaper than an unpreconditioned Richardson™s iteration on a single grid by a factor of

O(N 2 ).

In more realistic problems, variable coef¬cients in the differential equation couple the

high and low wavenumbers together. This makes it necessary to add a reverse cascade

in K: after the solution has been computed on the coarsest grid, one must interpolate to

a ¬ner grid, apply a few more iterations and repeat. Multigrid subroutines have rather

complicated control structures so that the program can iterate a suf¬ciently large number

of times on each level to ensure a successful transition to the next. It is usually unnecessary

to take 40 steps on each level when both a forward & backward cascade is used; three

iterations per level is more typical, but several ¬ne-to-coarse-to-¬ne transitions are usual.

The pre-conditioned Richardson™s iteration shares with multigrid the property that the

NUMBER OF ITERATIONS is INDEPENDENT OF THE GRID SIZE. Consequently, it is probably

quite senseless to apply multigrid to a one-dimensional problem. In two dimensions, the

high cost of the ¬nite difference preconditioning has inspired the approximate factoriza-

tions discussed in Sec. 15.5. Their condition number does increase with N , so multigrid

becomes increasingly superior as the number of grid points increases.

Note that multigrid is not a competitor to Richardson™s iteration or preconditioning.

Rather, multigrid is a strategy for accelerating Richardson™s iteration (with or without pre-

conditioning) or other iterations at the expense of additional programming and storage.

For further information, one can consult Zang, Wong, and Hussaini (1982, 1984), Brandt,

Fulton, and Taylor (1985), and Canuto et al. (1988). A very readable introduction to multi-

grid, although it discusses ¬nite differences only, is the tutorial by W. Briggs (1987).

15.8 The Minimum Residual Richardson™s (MRR) Method

This improvement of the conventional Richardson™s method, attributed to Y. S. Wong in

Zang, Wong, and Hussaini (1982), has the great virtue that it is a parameter-free algorithm

that does not require guessing the minimum and maximum eigenvalues of the iteration

matrix. Unlike the Chebyshev acceleration of Richardson™s method, the MMR algorithm

does not perform poorly when the matrix has complex eigenvalues. Indeed, the only

requirement for convergence of the Minimum Residual Richardson™s method is that the

eigenvalues of the (preconditioned) matrix lie strictly in the right half of the complex plane

(Eisenstat, Elman, and Schultz, 1983).

The basic step is identical with that of the preconditioned Richardson™s iteration for

A u = g:

I’„A

un+1 = un + „ g (15.51)

Like the Chebyshev-accelerated Richardson™s method, the MMR algorithm varies „ to op-

timize the rate of convergence. However, it does not set „ to a ¬xed value that requires

knowledge of the eigenvalues of A. Instead, the MRR varies „ on the ¬‚y, making a new

choice at each iteration, so as to minimize the residual of the next iterate.