(»i ’ q) bi = gi (15.95)

i = 0, . . . , N + 1

After solving (15.95), we obtain a from b by multiplying the latter by E.

Once the eigenvectors have been computed (in O[N 3 ] operations), each solution for

new values of q and/or f costs only O(6N 2 ) operations, that is, three multiplications of a

column vector by a square matrix. In the semi-implicit time-stepping algorithm of Orszag

and Patera (1982), this tactic is applied in the radial direction. It minimizes storage over

the more obvious technique of Gaussian elimination because the radial boundary value

problems are identical for different polar and axial wavenumbers except for a different

value of the constant q. Thus, they only need to store two matrices, Ex and its inverse,

whereas with Gaussian elimination it would be necessary to store a matrix of the same size

for each pair of polar and axial wavenumbers (several hundred in all!).

This trick is also a direct method of attacking two-dimensional separable problems be-

cause the diagonalization still works even if q is a differential operator, not a constant, so long

as the operator involves only the other coordinate, y. If this condition is met, then Ex and

»x will be independent of y. For example, if

q ≡ (s(y) ‚y ) ‚y + t(y), (15.96)

then if we discretize only the x-derivatives, (15.95) becomes a set of (N + 1) uncoupled

ordinary differential equations in y:

[s(y) bi,y ]y + [t(y) ’ »i ] bi (y) = ’gi (y) (15.97)

This is precisely what is generated by the analytical method of separation of variables,

applied to the original PDE. The condition that the discretized x-derivative matrix can

be diagonalized independent of y, which demands that the operator q involves only y

derivatives and functions of y, is just the condition that the original problem is separable.

In fact, the Haidvogel-Zang direct method is separation-of-variables.

An obvious question is: Why do we need this post-collocation variant of separation

of variables? The answer is that unless the boundary conditions are spatial periodicity, the

separation-of-variables eigenfunction series will have a slow, algebraic rate of convergence.

To obtain high accuracy, we need Chebyshev polynomials in x.4

4 Infairness, sometimes Chebyshev series converge as slowly as eigenseries for time-dependent problems when

the initial conditions fail to satisfy suf¬cient “compatibility conditions” at the boundaries (Boyd and Flyer, 1998).

CHAPTER 15. MATRIX-SOLVING METHODS

316

One way of looking at the Haidvogel-Zang procedure is that it is a method of solving

the set of (N + 1) coupled ordinary differential equations for the ai by making a “smart”

change-of-basis [in the space of (N + 1)-dimensional column vectors] to a new set of coef-

¬cients, the bi , such that the differential equations uncouple.

Solving N ordinary differential equations is still moderately expensive, but for Pois-

son™s equation (or any which is constant coef¬cient in y), Haidvogel and Zang note that

Clenshaw™s method (previous section!) applies at a cost of N operations per ODE in y.

Thus, after an O(N 3 ) preprocessing step, we can solve a separable PDE for different f (x, y)

at a cost of O(N 2 ) per solution. The ef¬ciency of the Haidvogel and Zang procedure can be

extended to equations with y-dependent coef¬cients by using the preconditioned Richard-

son™s iteration as described in earlier sections. (Note that variable x-coef¬cients are allowed

in either case.)

The preprocessing step, which is a one-dimensional eigenproblem, costs O(N 3 ) opera-

tions whereas fast direct ¬nite difference methods require only O(N 2 log2 N ) operations

where the tensor product grid has a total of N 2 points. However, on a 1997 personal

computer, the cost of solving an N = 100 one-dimensional eigenproblem was only half

a second! The preprocessing step was expensive in 1979, but not today.

If the separable elliptic problem is part of a semi-implicit time-marching algorithm,

then the cost of the O(N 3 ) preprocessing step for the spectral method is amortized over

thousands of time steps. The actual cost per time step is merely that for a single backsolve,

which is O(N 2 ) ” directly proportional to the number of grid points ” for both spectral

and ¬nite difference methods.

Haldewang, Labrosse, Abboudi and Deville (1984) compared three algorithms for the

Helmholtz equation. The Haidvogel-Zang strategy is to diagonalize in two dimensions

and solve tridiagonal systems in the other, and proved the fastest. However, Haldewang

et al. found moderately severe roundoff problems with this method, losing ¬ve decimal

digits for N as small as 64.

Full diagonalization (in all three dimensions) was slightly slower but easier to program.

This algorithm became very popular in France and perhaps deserves a wider range of

applications.5

The third algorithm was a ¬nite difference-preconditioned iteration. This is cheap for

separable problems because the preconditioning can be done using the special “fast direct

methods” for separable problems, which cost only O(N 3 log2 (N )) operations on an N 3 grid

in three dimensions.6 The preconditioned iteration was an order of magnitude slower than

the Haidvogel-Zang procedure at all N , but the fast direct ¬nite difference solver can be

used as a preconditioner for nonseparable problems. The iteration loses little accuracy to

round-off error even at very large N .

Liffman(1996) extended the Haidvogel-Zang eigenanalysis to incorporate very general

(Robin) boundary conditions. This has been widely used by the Nice group under the

name of “matrix diagonalization” since Ehrenstein and Peyret (1989).

Patera (1986) combined the Haidvogel-Zang algorithm with static condensation to cre-

ate a spectral element Poisson-solver with an O(N 5/2 ) operation count, but this algorithm

has largely been replaced by multigrid.

Siyyam and Syam(1997) offer a Chebyshev-tau alternative to the Haidvogel-Zang tech-

nique. Their experiments show that their method is faster, at least for large N , but their

algorithm has not been extended to PDEs with spatially-varying coef¬cients.

5 Because there are often several algorithms of roughly equal effectiveness for a given problem, numerical

analysis is prone to national and individual fashions that are determined more by accidents of education and

exposure than by rigorous mathematical justi¬cation.

6 The FISHPACK library, written by John Adams and available through the National Center for Atmospheric

Research, is an excellent collection of fast direct methods.

15.12. FAST ITERATIONS FOR ALMOST SEPARABLE PDES 317

Ierley(1997) has developed a similar fast algorithm for the Laplace operator in an arbi-

trary number of space dimensions. One key ingredient is to use Jacobi polynomials rather

than Chebyshev polynomials.

J. Shen (1994a,1995b) and Lopez and Shen(1998) developed Legendre-Galerkin schemes

which cost O(N 3 ) operations for second and fourth order constant coef¬cient elliptic equa-

tions. With the basis φj (x) = Lj+2 (x) ’ Lj (x), the weak form of the second derivative

is a diagonal matrix. He developed similar Chebyshev algorithms in Shen(1995a). Both

his Chebyshev and Legendre forms have condition numbers of O(N 4 ) for fourth order

problems, versus O(N 8 ) for some other spectral algorithms, so that it is possible to solve

the biharmonic equation with O(500) unknowns in each coordinate to high accuracy. His

Chebyshev algorithm costs O(N 4 ) for fourth order problems, but Bjørstad and Tjøstheim

(1997) developed an O(N 3 ) modi¬cation, and made a number of further improvements in

both Legendre and Chebyshev algorithms for fourth order problems.

Braverman, Israeli, Averbuch and Vozovoi (1998) developed a Poisson solver in three

dimensions which employs a Fourier basis even though the boundary conditions are non-

periodic. A clever regularization method restores an exponential rate of convergence,

which would normally be lost by non-periodic application of a Fourier series.

15.12 Fast Iterations for Almost Separable PDEs

D™yakonov(1961) and Concus and Golub (1973) noted that non-separable elliptic equations

can be solved very ef¬ciently by using an iteration in which a separable differential equa-

tion is solved at each iteration. (Obviously, the coef¬cients of the separable BVP are chosen

to approximate those of the nonseparable problem as closely as possible.)

Several groups have developed ef¬cient spectral algorithms for nonseparable elliptic

equations using a D™yakonov/Concus-Golub iteration. Guillard and Desideri(1990) com-

bined the Minimum Residual Richardson™s iteration with two different spectral precondi-

tioners that both solved separable problems. The more ef¬cient preconditioner required

computing eigenvalues and eigenvectors in a preprocessing step as in the Haidvogel and

Zang procedure. Strain(1994) solved non-separable periodic problems with a Fourier ba-

sis where the solution of the separable differential equation is almost trivial. Zhao and

Yedlin(1994) showed that the ef¬ciency of solve-a-separable-PDE iterations is as good as

multigrid, at least sometimes. Dimitropoulus and Beris (1997) found that the combination

of a Concus-Golub iteration with a fast spectral Helmholtz solver did not always converge

for problems with strongly varying coef¬cients. However, when this was embedded as an

inner iteration with the biconjugate gradient method, BiCGstab(m), as the outer iteration,

the result was a very robust and ef¬cient algorithm. Typically only two Concus-Golub it-

erations were needed between each outer iteration; the biconjugate gradient method never

needed more than nine iterations.

The title of this section, “almost separable”, is a little misleading. Although these iter-

ations converge most rapidly if the PDE is only a slight perturbation of a separable prob-

lem, the examples of Zhao and Yedlin, Strain, and Dimitropoulus and Beris show that

often the D™yakonov/Concus-Golub algorithm converges very rapidly even when the BVP

only vaguely resembles a separable problem. This is especially true when the iteration is

combined with other convergence-accelerating strategies such as the biconjugate gradient

method used as an outer iteration by Dimitropoulus and Beris.

CHAPTER 15. MATRIX-SOLVING METHODS

318

15.13 Positive De¬nite and Inde¬nite Matrices

As noted in Sec. 2, it is very important that the pseudospectral matrix be positive de¬nite

because most iterations are analogous to time-integration of the diffusion equation

ut = ’Λ u + f (15.98)

where Λ is the square matrix which is the discretization of the differential operator. If

Λ has only positive eigenvalues, then the solution of (15.98) will decay monotonically to

the desired steady-state which solves Λu = f . However, if Λ is inde¬nite, that is, has

eigenvalues of both signs, then the negative eigenvalue modes will grow exponentially

with time and the iteration will diverge.

This would seem to be a very serious restriction because even so simple a problem as

the one-dimensional Helmholtz equation

’uxx ’ q u = f (x) (15.99)

u(0) = u(π) = 0

with periodic boundary conditions has the eigenvalues

» n = n2 ’ q (15.100)

n = 1, 2, . . .

The lowest eigenvalue is negative for q > 1, two eigenvalues are negative for q > 4, three

for q > 9, and so on. Richardson™s iteration would diverge.

Happily, both preconditioning and multigrid usually solve the problem of a ¬nite num-

ber of small eigenvalues of the wrong sign. The reason that preconditioning is effective

is that the offending modes are the ones with the simplest spatial structure. If q = 8,

for example, only the modes sin(x) and sin(2x) are unstable. However, these modes are

precisely the ones which will be accurately resolved by both the ¬nite difference and pseu-

dospectral approximations. Consequently, sin(x) and sin(2x) (or more precisely, the col-

umn vectors containing the grid point values of these functions) are both eigenvectors of

’1

the pre-conditioned matrix H Λ with eigenvalues approximately equal to 1.

There is a snare: when q is very close to n2 , one eigenvalue of Λ may be very, very close

to zero ” smaller in magnitude than the small error in the ¬nite difference approximation

of this same mode. Canuto et al. (1988, pg. 143) show that the pre-conditioned eigenvalue

is

n2 ’ q

»(p) (15.101)

=

n

x/2)2 ’ q

n2 sin2 (n x/2)/(n

For ¬xed q & n, »n (p) ’ 1 in the limit x ’ 0. However, for ¬xed x, there is always

(p)

a range of q suf¬ciently close to n2 so that »n is negative and Richardson™s iteration will

then diverge for any time step.

Since the grid spacing x always is small, one has to be rather unlucky for a pre-

conditioned iteration to diverge, but clearly, it can happen. It is encouraging, however,

that preconditioning will remove the problem of matrix inde¬niteness most of the time.

Multigrid is intrinsically just an ef¬cient way of performing Richardson™s iteration. It

will therefore diverge due to matrix inde¬niteness whenever Richardson™s would, too.

However, a common practice in multigrid ” even when the matrix is positive de¬nite ” is

to apply Gaussian elimination instead of iteration on the coarsest grid. The obvious reason

is that when the coarse grid contains only nine points, Gaussian elimination is ridiculously

15.13. POSITIVE DEFINITE AND INDEFINITE MATRICES 319

cheap. The subtle, empirical reason is that using direct methods on the coarsest grid seems

to give additional stability and improved convergence rate to the whole iteration.

When the matrix is inde¬nite but has only one or two negative eigenvalues, this practice

of Gaussian elimination on the coarsest grid is a life-saver. Although the coarse grid cannot

resolve the high modes of the operator, it can resolve ” at least crudely ” the lowest one

or two modes which (otherwise) would make the iteration diverge.

A little care is needed. If there are several unstable modes, one needs a coarsest grid

which has at least a moderate number of points.

There is, alas, a small class of problems for which neither multigrid nor precondition-

ing will help: those with an in¬nite number of eigenvalues of negative sign. An example is

Laplace™s Tidal equation for waves in a semi-in¬nite atmosphere (Chapman and Lindzen,

1970). The tidal equation is very messy because it includes complicated trigonometric

terms due to the earth™s sphericity. However,

uyy + (ω 2 ’ y 2 ) uzz = f (y, z) (15.102)

u(y, z) bounded as z ’ ∞ (15.103)

u(±L, z) = 0 & u(y, 0) = 0 &

displays the same qualitative behavior.

Separation-of-variables gives the latitudinal eigenequation

uyy + » (ω 2 ’ y 2 ) u = 0, (15.104)

the parabolic cylinder equation. If the boundary L > ω, then (15.104) has an in¬nite num-

ber of solutions with positive eigenvalues. The eigenmodes oscillate on y ∈ [’ω, ω] and

decay exponentially for larger y; they may be accurately approximated by Hermite func-

tions (as is also true of the corresponding tidal modes). The vertical structure of these

modes is oscillatory.

There is also an in¬nite number of modes with negative eigenvalues. These oscillate on

the intervals y ∈ [’L, ’ω] & y ∈ [ω, L] and are exponentially small at y = 0. These modes

decay exponentially as z ’ ∞.

Eq. (15.102) is rather peculiar: an equation of mixed elliptic-hyperbolic type. The bound-

ary conditions are also a little peculiar, too, in that either oscillatory or exponential behavior

is allowed as z ’ ∞. Nevertheless, the linear wave modes of a rotating, semi-in¬nite at-

mosphere are described by a mixed elliptic-hyperbolic equation similar to (15.102), and the

modes really do fall into two in¬nite classes: one with » > 0 and the other with » < 0.

No iterative method has ever been successfully applied to these problems. Lindzen (1970),

Schoeberl & Geller (1977) and Forbes & Garrett (1976) have all obtained good results by

combining ¬nite differences with banded Gaussian elimination.

Navarra(1987) combined ¬nite differences in z with spherical harmonics in latitude. By

applying Arnoldi™s matrix-solving iteration, Navarra solved mixed-type boundary value

problems with as many as 13,000 unknowns.

In principle, inde¬niteness may be cured by squaring the matrix, i. e.

T T

(15.105)

Λ Λu = Λ f

where superscript “T” denotes the matrix transpose. The disadvantage of matrix-squaring

is that if the condition number of Λ is O(N 2 ), then the condition number of the positive

T

de¬nite matrix Λ Λ is O(N 4 ).

However, we have already seen that preconditioning is strongly recommended even

T

for the unsquared matrix. For Λ and Λ Λ, ¬nite difference preconditioning will give a

CHAPTER 15. MATRIX-SOLVING METHODS

320

condition number for the squared matrix which is O(1), independent of N . This tactic of

squaring the matrix should work even for the dif¬cult case of the Laplace Tidal equation,

provided that none of its eigenvalues are too close to zero7 .

What if Λ has complex eigenvalues? First, the transpose in (15.105) should be interpreted

as the Hermitian adjoint. Second, the Chebyshev acceleration of Richardson™s iteration (¬rst

edition of this book) should be avoided unless the imaginary parts are small. The Richard-

son™s iteration, with or without preconditioning, will still converge rapidly as long as the

real parts of the eigenvalues are all positive, and better still, so will its Minimum Residual