there; this behavioral boundary condition is implicitly satis¬ed by a truncated Chebyshev

series because these polynomials are analytic at x = ±1.

Behavioral boundary conditions are the rule rather than the exception in spherical or

cylindrical geometry, or whenever the coordinate system introduces singularities that do

not re¬‚ect the physics of the problem. (Laplace™s Tidal Equation is usually expressed in

terms of colatitude φ on a sphere; we have converted it to a form suitable for Chebyshev

polynomials by making the change-of-variable x = cos(φ).) Spherical coordinates are fully

discussed with numerical illustrations in Boyd (1978b) and in Chapter 18.

Another example is Bessel™s equation, which arises in polar coordinates or cylindrical

coordinates. The function Jn has an n-th order zero at the origin, but one can blindly

assume a solution in Chebyshev polynomials in radius r, and obtain good results. An

approximation table for J7 (r) is given in Gottlieb and Orszag (1977).

A third class of examples is furnished by problems on an unbounded interval. “Be-

havioral” versus “numerical” boundary conditions for this case are discussed in Chapter

17.

Second, because this is a problem in spherical coordinates, the natural basis set would

be spherical harmonics ” Legendre polynomials for our special case of zero zonal wavenum-

ber. Nonetheless, we bravely applied Chebyshev polynomials to compute the smallest

eigenvalue to within less than 8% error by solving a quadratic equation.

As claimed earlier, Chebyshev polynomials work just ¬ne even when they are not the

obvious or the optimum basis set.

CHAPTER 3. GALERKIN & WEIGHTED RESIDUAL METHODS

76

3.7 Separation-of-Variables & the Galerkin Method

In the pre-computer age, partial differential equations were solved almost exclusively by

“separation“of“variables”. While systematizing this procedure, Sturm and Liouville dis-

covered that eigenvalue problems generate eigenfunctions which have all the properties

that make them desirable as spectral basis functions. Conversely, one can look backwards

and see that “separation“of“variables” is just a special case of spectral methods.

EXAMPLE: Diffusion Equation

(3.57)

ut = uxx

with boundary conditions of spatial periodicity with a period of 2π and the initial condition

u(x, t = 0) = Q(x) where Q(x) is arbitrary. Since the boundary conditions are periodicity,

the obvious basis is a Fourier series:

∞ ∞

(3.58)

u(x, t) = a0 (t) + an (t) cos(nx) + bn (t) sin(nx)

n=1 n=1

Since the basis functions are individually periodic, all rows of the Galerkin matrix come

from the differential equation. The cosine rows are

= (cos(mx), {‚t ’ ‚xx } cos(nx)) (3.59)

Hmn

= (cos(mx), {‚t + n2 } cos(nx)) (3.60)

= {‚t + n2 }δmn π (3.61)

The sine rows are similar. It is not necessary to bother with a particular truncation be-

cause the Galerkin matrix is diagonal and each sine or cosine function is uncoupled from all

the others. The spectral method in x reduces the problem to the uncoupled, independent

ordinary differential equations

an,t + n2 an = 0; bn,t + n2 bn = 0 (3.62)

a0,t = 0;

an (t) = an (0) exp(’n2 t); bn (t) = bn (0) exp(’n2 t) (3.63)

a0 (t) = a0 (0);

where the values of the coef¬cients at t = 0 are obtained by expanding the initial condition

Q(x) as a Fourier series.

We could alternatively solve this same problem with Chebyshev polynomials. The

Chebyshev series would also generate a set of ordinary differential equations in time. The

difference is that with a basis of anything other than trigonometric functions, all the dif-

ferential equations would be coupled, just as were the pair obtained by Fourier series for

the variable coef¬cient diffusion equation in Chapter 1. Instead of the explicit solutions of

(3.63), we are left with a mess.

The heart of “separation“of“variables” is to make a clever choice of basis functions so

that the differential equations in the remaining variable are uncoupled. For the constant

coef¬cient diffusion equation, sines and cosines are the “clever” choice, independent of

the boundary conditions.

Separation-of-variables series often converge at only an algebraic rate. However, this

slow convergence frequently re¬‚ects corner singularities in space or space-time, so non-

separable basis sets like Chebyshev and Legendre polynomials converge algebraically, too.

For more complicated equations, alas, it is necessary to use basis sets more complicated

than trigonometric functions to “separate variables” ” wierd creatures named Mathieu,

3.8. HEISENBERG MATRIX MECHANICS 77

Lam´ , and parabolic cylinder functions, to list but a few. We shall spare the reader these

e

exotica. Nevertheless, since most textbooks on the physical sciences written before 1960

are little more than catalogues of successful applications of “separation-of-variables”, it is

important to recognize that it is but a special case of spectral methods.

3.8 Galerkin Method in Quantum Theory: Heisenberg Ma-

trix Mechanics

In quantum mechanics, one computes the energy levels of atoms and molecules by solving

the eigenvalue problem

(3.64)

Hψ = Eψ

where the linear operator H is the “Hamiltonian”, E is the energy and also the eigenvalue,

and ψ(x) is the wave function. Until very recently, spectral methods were the only tool used

on Schroedinger™s equation. Indeed, the ¬rst half-century of wave mechanics is mostly a

catalogue of increasingly elaborate applications of spectral methods to calculate energy

levels.

Why spectral methods? The ¬rst reason is the curse of dimensionality. The wavefunc-

tion for the helium atom, which has two electrons, is a function of the coordinates of both

electrons and thus is six-dimensional even though physical space has only three dimen-

sions. An N -electron wavefunction is a native of a 3N -dimensional con¬guration space.

The second reason is that one special case, the hydrogen atom, can be solved in closed,

analytical form by separation-of-variables. The eigenfunctions are the product of Laguerre

functions in radius with spherical harmonics in longitude and latitude; although a bit

messy, the eigenfunctions can evaluated by recurrence on a four-function calculator. This

is a great gift because the orbits of electrons in molecules and multi-electron atoms are not

too much distorted from those of the hydrogen atom. In other words, it is possible to build

up good approximations to molecular structure from a small number of hydrogenic wave

functions.

The third reason is the Rayleigh-Ritz variational principle. This states that for a lin-

ear, self-adjoint eigenvalue problem, the solution for the lowest eigenvalue is that which

minimizes the “Rayleigh functional”

RR ≡ (ψ, Hψ)/(ψ, ψ) (3.65)

(Physicists call the mode with the lowest eigenvalue the “ground state”.) If we substitute a

trial wavefunction ψtrial into RR(ψ), then E ≈ RR. The special virtue of the Rayleigh-Ritz

principle is that one can prove that

E ¤ RR(ψ) for all ψ (3.66)

In words, any calculation, even a crude one, will always give an upper bound on the true

energy.

The Rayleigh-Ritz principle allowed the early quantum chemists to search for approx-

imate wave functions with wild abandon. It was rather like breaking an Olympic record;

each guess that broke the previous minimum for E was a triumph since the Rayleigh-Ritz

principle guaranteed that this had to be a better approximation than any wavefunction that

had gone before it. In particular, one did not need to worry about completeness, or even

linearity. If the trial wavefunction depended nonlinearly on certain parameters, one could

still minimize R with respect to those parameters and perhaps obtain a very good approx-

imation.

CHAPTER 3. GALERKIN & WEIGHTED RESIDUAL METHODS

78

+

EXAMPLE: H2 MOLECULAR ION

This is a one-electron ion with two attracting protons which we shall label “A” and “B”.

Now the ground state for the hydrogen atom is

(3.67)

ψA = [constant] exp[’rA /a0 ]

where rA is the distance from the electron to proton A and where a0 is the “Bohr radius”.

To simplify notation, we shall ignore the normalizing constants as in (3.67). An obvious

approach is to use hydrogen 1s orbitals like (3.67), but to take

ψ ≈ ψ A ± ψB (3.68)

where ψA is given by (3.67) and where ψB has the same functional form except that rA is re-

placed by rB , the distance from the electron to the second proton. This trial solution allows

for the electron™s freedom to orbit either proton or to trace ¬gure-eights. The symmetry

” both protons have equal charge ” implies that ψA and ψB must come in with equal

amplitude, but we must allow for the possibility of the minus sign. Graphing the Rayleigh

functional as a function of the internuclear separation rN then gives a prediction for E, and

simultaneously for the distance between the two protons.

Crude as this approximation is, it correctly predicts negative energy for the plus sign in

(3.68); the hydrogen ion is a stable species. The observed internuclear separation distance

is such that contours of wavefunction amplitude are nearer the protons in the ion than for

the hydrogen atom, so this suggests improving (3.68) by taking ψA to be of the same shape

as (3.67), but with a re-scaled radial dependence:

ψA ≡ exp[’crA /a0 ] (3.69)

and similarly for ψB . Minimizing R with respect to both c and rN gives the lowest energy

for c = 1.24.

The third improvement is to note that the pull of the other proton tends to distort the

hydrogenic orbitals so that they are elongated in the direction of the other positive charge.

This “polarity effect” suggests

(3.70)

ψA = exp[’crA /a0 ] + »x exp[’crA /a0 ]

where the x-axis is aligned along the line between the two protons. The result of min-

imizing the Rayleigh functional with respect to c and » is shown as a function of rN in

Fig. 3.3, which is taken from Coulson (1961). In Coulson™s words, “the true energy differs

inappreciably from (iii) [7.7]”.

This is a stirring success for an approximation with just three parameters. When more

terms are included, as is necessary for more complicated molecules, it becomes a great

burden to use nonlinear parameters, so chemists are forced to use approximations that

depend linearly on the unknowns, that is, a spectral series. The Rayleigh-Ritz method is

then equivalent to the Galerkin. The “valence bond” and “molecular orbital” theories are

merely different choices of spectral basis functions. The acronym often used for the latter

” L. C. A. O.“M. O. theory ” sums up the underlying philosophy of both: “Linear Com-

binations of Atomic Orbitals - Molecular Orbitals”. In other words, the approximations for

molecules are assembled using atomic orbitals as the basis functions.

For some types of bonds, the appropriate atomic orbitals are the “s-orbitals” of hydro-

gen; for others, the p-, d- or higher orbitals are appropriate. For the benzene ring, for

example, which consists of six carbon atoms at the corners of a hexagon, the simplest ap-

proximation is

ψ ≈ a1 ψ1 + a2 ψ2 + a3 ψ3 + a4 ψ4 + a5 ψ5 + a6 ψ6 (3.71)

3.8. HEISENBERG MATRIX MECHANICS 79

Figure 3.3: A graph of the energy for the hydrogen molecular ion H+ , as a function of the

2

internuclear separation RN in units of the Bohr radius a0 . (i) Sum of two hydrogen orbitals,

ψ = ψA + ψB (ii) Sum of two “screened” hydrogen orbitals which incorporate the screening

parameter, c (iii) ψA = exp(’crA /a0 ) + »x exp(’crA /a0 ) and similarly for ψB where » is

known as the “polarity” constant. The true energy differs negligibly from the minimum of

(iii). After Coulson (1961) and Dickinson(1933).

where the ψi ™s are hydrogen p-orbitals centered on each of the six carbon atoms and where

the ai ™s are the spectral coef¬cients. The separation of the carbon atoms, although it can be

calculated as for the hydrogen molecular ion, is usually taken as an observational given at

this (crude) order of approximation so that the problem is linear in the variational parame-

ters. Minimizing the Rayleigh functional with respect to the ai ™s gives the matrix problem

(ψ1 , H ’ Eψ1 ) (ψ1 , H ’ Eψ2 ) . . . (ψ1 , H ’ Eψ6 ) a 0

(ψ2 , H ’ Eψ1 ) ... ... ... a2 0

(3.72)

.

.

. =

(ψ6 , H ’ Eψ1 ) (ψ6 , H ’ Eψ6 )

... ... a6 0

Chemists, for historical reasons, call determinants of this form “secular determinants”,

but the matrices are the same as those created by the Galerkin method. Eq. (3.72) is a

standard matrix eigenvalue equation with E as the eigenvalue.

As Fig. 3.3 shows, heuristic guessing of basis functions which are not standard basis

functions can be very effective when coupled with the Rayleigh-Ritz principle and an ap-

propriate measure of desperation. This strategy is a good way of generating low-order

analytical approximations in other ¬elds, too, but it can be risky. With the Rayleigh-Ritz

principle, a bad guess is immediately evident: its lowest energy is larger than that of previ-

ous calculations. Victory is as clear and well-de¬ned as breaking a record in track-and-¬eld.

In other scienti¬c areas, this strategy has also been tried, but the risks are much higher.

In the absence of the Rayleigh-Ritz principle, guessing trial functions may make Galerkin

and collocation approximations worse instead of better (Finlayson, 1973).

Fortunately, it is possible to regain security of the Rayleigh-Ritz principle. The remedy

is to minimize the square of the residual as explained in Appendix G of the ¬rst edition of

this book (1989). The cost is that the programming is more complicated and the execution

time and storage are somewhat increased.

CHAPTER 3. GALERKIN & WEIGHTED RESIDUAL METHODS

80

3.9 The Galerkin Method Today

In the next chapter, we will discuss pseudospectral/collocation algorithms, so it is a little

premature to compare different methods. However, a few preliminary remarks are in order.

First, the Galerkin method is more dif¬cult to program than pseudospectral algorithms.

The reason is simply that it is easier to evaluate a function than it is to integrate it.

It is often possible to replace the numerical integration by use of trigonometric identi-

ties, but this makes it more dif¬cult to test the program because a simple problem with a

known, exact solution may not use all the identities which are needed to solve more com-

plex equations (O™Connor, 1996). In contrast, a collocation program (or Galerkin method

with numerical evaluation of the matrix elements) does not use special tricks, and therefore

will either succeed or fail for both the test problem and the real one.

Second, the Galerkin method with a truncation of N is usually equal in accuracy to a

pseudospectral method with a truncation of N +1 or N +2. When N is large, this difference

is negligible, and pseudospectral algorithms are better. When N is small, however, that

extra one or two degrees of freedom can be very important.

We have already seen that spectral methods are often accurate even for N so small that

one can substitute paper-and-pencil for a computer. Thus, spectral methods are not merely

a tool for numerical analysis, but for theoretical analysis, too. For theoretical purposes, the

smallness of N usually implies that one needs that extra accuracy given by the Galerkin

method. Furthermore, if a problem is simple enough so that one can analytically perform

the linear algebra, one can usually analytically evaluate the integrals for the Galerkin ma-

trix elements, too. Finlayson (1973) is full of such examples, drawn mostly from engineer-

ing. A full discussion with programs in the algebraic manipulation languages REDUCE

and Maple is given in Chapter 20 and Boyd (1993).