. 19
( 136 .)


analytical at the endpoints even though the differential equation has singular coef¬cients
there; this behavioral boundary condition is implicitly satis¬ed by a truncated Chebyshev
series because these polynomials are analytic at x = ±1.
Behavioral boundary conditions are the rule rather than the exception in spherical or
cylindrical geometry, or whenever the coordinate system introduces singularities that do
not re¬‚ect the physics of the problem. (Laplace™s Tidal Equation is usually expressed in
terms of colatitude φ on a sphere; we have converted it to a form suitable for Chebyshev
polynomials by making the change-of-variable x = cos(φ).) Spherical coordinates are fully
discussed with numerical illustrations in Boyd (1978b) and in Chapter 18.
Another example is Bessel™s equation, which arises in polar coordinates or cylindrical
coordinates. The function Jn has an n-th order zero at the origin, but one can blindly
assume a solution in Chebyshev polynomials in radius r, and obtain good results. An
approximation table for J7 (r) is given in Gottlieb and Orszag (1977).
A third class of examples is furnished by problems on an unbounded interval. “Be-
havioral” versus “numerical” boundary conditions for this case are discussed in Chapter
Second, because this is a problem in spherical coordinates, the natural basis set would
be spherical harmonics ” Legendre polynomials for our special case of zero zonal wavenum-
ber. Nonetheless, we bravely applied Chebyshev polynomials to compute the smallest
eigenvalue to within less than 8% error by solving a quadratic equation.
As claimed earlier, Chebyshev polynomials work just ¬ne even when they are not the
obvious or the optimum basis set.

3.7 Separation-of-Variables & the Galerkin Method
In the pre-computer age, partial differential equations were solved almost exclusively by
“separation“of“variables”. While systematizing this procedure, Sturm and Liouville dis-
covered that eigenvalue problems generate eigenfunctions which have all the properties
that make them desirable as spectral basis functions. Conversely, one can look backwards
and see that “separation“of“variables” is just a special case of spectral methods.

EXAMPLE: Diffusion Equation

ut = uxx

with boundary conditions of spatial periodicity with a period of 2π and the initial condition
u(x, t = 0) = Q(x) where Q(x) is arbitrary. Since the boundary conditions are periodicity,
the obvious basis is a Fourier series:
∞ ∞
u(x, t) = a0 (t) + an (t) cos(nx) + bn (t) sin(nx)
n=1 n=1

Since the basis functions are individually periodic, all rows of the Galerkin matrix come
from the differential equation. The cosine rows are

= (cos(mx), {‚t ’ ‚xx } cos(nx)) (3.59)
= (cos(mx), {‚t + n2 } cos(nx)) (3.60)
= {‚t + n2 }δmn π (3.61)

The sine rows are similar. It is not necessary to bother with a particular truncation be-
cause the Galerkin matrix is diagonal and each sine or cosine function is uncoupled from all
the others. The spectral method in x reduces the problem to the uncoupled, independent
ordinary differential equations

an,t + n2 an = 0; bn,t + n2 bn = 0 (3.62)
a0,t = 0;

an (t) = an (0) exp(’n2 t); bn (t) = bn (0) exp(’n2 t) (3.63)
a0 (t) = a0 (0);

where the values of the coef¬cients at t = 0 are obtained by expanding the initial condition
Q(x) as a Fourier series.
We could alternatively solve this same problem with Chebyshev polynomials. The
Chebyshev series would also generate a set of ordinary differential equations in time. The
difference is that with a basis of anything other than trigonometric functions, all the dif-
ferential equations would be coupled, just as were the pair obtained by Fourier series for
the variable coef¬cient diffusion equation in Chapter 1. Instead of the explicit solutions of
(3.63), we are left with a mess.
The heart of “separation“of“variables” is to make a clever choice of basis functions so
that the differential equations in the remaining variable are uncoupled. For the constant
coef¬cient diffusion equation, sines and cosines are the “clever” choice, independent of
the boundary conditions.
Separation-of-variables series often converge at only an algebraic rate. However, this
slow convergence frequently re¬‚ects corner singularities in space or space-time, so non-
separable basis sets like Chebyshev and Legendre polynomials converge algebraically, too.
For more complicated equations, alas, it is necessary to use basis sets more complicated
than trigonometric functions to “separate variables” ” wierd creatures named Mathieu,

Lam´ , and parabolic cylinder functions, to list but a few. We shall spare the reader these
exotica. Nevertheless, since most textbooks on the physical sciences written before 1960
are little more than catalogues of successful applications of “separation-of-variables”, it is
important to recognize that it is but a special case of spectral methods.

3.8 Galerkin Method in Quantum Theory: Heisenberg Ma-
trix Mechanics
In quantum mechanics, one computes the energy levels of atoms and molecules by solving
the eigenvalue problem

Hψ = Eψ

where the linear operator H is the “Hamiltonian”, E is the energy and also the eigenvalue,
and ψ(x) is the wave function. Until very recently, spectral methods were the only tool used
on Schroedinger™s equation. Indeed, the ¬rst half-century of wave mechanics is mostly a
catalogue of increasingly elaborate applications of spectral methods to calculate energy
Why spectral methods? The ¬rst reason is the curse of dimensionality. The wavefunc-
tion for the helium atom, which has two electrons, is a function of the coordinates of both
electrons and thus is six-dimensional even though physical space has only three dimen-
sions. An N -electron wavefunction is a native of a 3N -dimensional con¬guration space.
The second reason is that one special case, the hydrogen atom, can be solved in closed,
analytical form by separation-of-variables. The eigenfunctions are the product of Laguerre
functions in radius with spherical harmonics in longitude and latitude; although a bit
messy, the eigenfunctions can evaluated by recurrence on a four-function calculator. This
is a great gift because the orbits of electrons in molecules and multi-electron atoms are not
too much distorted from those of the hydrogen atom. In other words, it is possible to build
up good approximations to molecular structure from a small number of hydrogenic wave
The third reason is the Rayleigh-Ritz variational principle. This states that for a lin-
ear, self-adjoint eigenvalue problem, the solution for the lowest eigenvalue is that which
minimizes the “Rayleigh functional”

RR ≡ (ψ, Hψ)/(ψ, ψ) (3.65)

(Physicists call the mode with the lowest eigenvalue the “ground state”.) If we substitute a
trial wavefunction ψtrial into RR(ψ), then E ≈ RR. The special virtue of the Rayleigh-Ritz
principle is that one can prove that

E ¤ RR(ψ) for all ψ (3.66)

In words, any calculation, even a crude one, will always give an upper bound on the true
The Rayleigh-Ritz principle allowed the early quantum chemists to search for approx-
imate wave functions with wild abandon. It was rather like breaking an Olympic record;
each guess that broke the previous minimum for E was a triumph since the Rayleigh-Ritz
principle guaranteed that this had to be a better approximation than any wavefunction that
had gone before it. In particular, one did not need to worry about completeness, or even
linearity. If the trial wavefunction depended nonlinearly on certain parameters, one could
still minimize R with respect to those parameters and perhaps obtain a very good approx-

This is a one-electron ion with two attracting protons which we shall label “A” and “B”.
Now the ground state for the hydrogen atom is

ψA = [constant] exp[’rA /a0 ]

where rA is the distance from the electron to proton A and where a0 is the “Bohr radius”.
To simplify notation, we shall ignore the normalizing constants as in (3.67). An obvious
approach is to use hydrogen 1s orbitals like (3.67), but to take

ψ ≈ ψ A ± ψB (3.68)

where ψA is given by (3.67) and where ψB has the same functional form except that rA is re-
placed by rB , the distance from the electron to the second proton. This trial solution allows
for the electron™s freedom to orbit either proton or to trace ¬gure-eights. The symmetry
” both protons have equal charge ” implies that ψA and ψB must come in with equal
amplitude, but we must allow for the possibility of the minus sign. Graphing the Rayleigh
functional as a function of the internuclear separation rN then gives a prediction for E, and
simultaneously for the distance between the two protons.
Crude as this approximation is, it correctly predicts negative energy for the plus sign in
(3.68); the hydrogen ion is a stable species. The observed internuclear separation distance
is such that contours of wavefunction amplitude are nearer the protons in the ion than for
the hydrogen atom, so this suggests improving (3.68) by taking ψA to be of the same shape
as (3.67), but with a re-scaled radial dependence:

ψA ≡ exp[’crA /a0 ] (3.69)

and similarly for ψB . Minimizing R with respect to both c and rN gives the lowest energy
for c = 1.24.
The third improvement is to note that the pull of the other proton tends to distort the
hydrogenic orbitals so that they are elongated in the direction of the other positive charge.
This “polarity effect” suggests

ψA = exp[’crA /a0 ] + »x exp[’crA /a0 ]

where the x-axis is aligned along the line between the two protons. The result of min-
imizing the Rayleigh functional with respect to c and » is shown as a function of rN in
Fig. 3.3, which is taken from Coulson (1961). In Coulson™s words, “the true energy differs
inappreciably from (iii) [7.7]”.
This is a stirring success for an approximation with just three parameters. When more
terms are included, as is necessary for more complicated molecules, it becomes a great
burden to use nonlinear parameters, so chemists are forced to use approximations that
depend linearly on the unknowns, that is, a spectral series. The Rayleigh-Ritz method is
then equivalent to the Galerkin. The “valence bond” and “molecular orbital” theories are
merely different choices of spectral basis functions. The acronym often used for the latter
” L. C. A. O.“M. O. theory ” sums up the underlying philosophy of both: “Linear Com-
binations of Atomic Orbitals - Molecular Orbitals”. In other words, the approximations for
molecules are assembled using atomic orbitals as the basis functions.
For some types of bonds, the appropriate atomic orbitals are the “s-orbitals” of hydro-
gen; for others, the p-, d- or higher orbitals are appropriate. For the benzene ring, for
example, which consists of six carbon atoms at the corners of a hexagon, the simplest ap-
proximation is

ψ ≈ a1 ψ1 + a2 ψ2 + a3 ψ3 + a4 ψ4 + a5 ψ5 + a6 ψ6 (3.71)

Figure 3.3: A graph of the energy for the hydrogen molecular ion H+ , as a function of the
internuclear separation RN in units of the Bohr radius a0 . (i) Sum of two hydrogen orbitals,
ψ = ψA + ψB (ii) Sum of two “screened” hydrogen orbitals which incorporate the screening
parameter, c (iii) ψA = exp(’crA /a0 ) + »x exp(’crA /a0 ) and similarly for ψB where » is
known as the “polarity” constant. The true energy differs negligibly from the minimum of
(iii). After Coulson (1961) and Dickinson(1933).

where the ψi ™s are hydrogen p-orbitals centered on each of the six carbon atoms and where
the ai ™s are the spectral coef¬cients. The separation of the carbon atoms, although it can be
calculated as for the hydrogen molecular ion, is usually taken as an observational given at
this (crude) order of approximation so that the problem is linear in the variational parame-
ters. Minimizing the Rayleigh functional with respect to the ai ™s gives the matrix problem

(ψ1 , H ’ Eψ1 ) (ψ1 , H ’ Eψ2 ) . . . (ψ1 , H ’ Eψ6 ) a 0
(ψ2 , H ’ Eψ1 ) ... ... ... a2 0
. =
(ψ6 , H ’ Eψ1 ) (ψ6 , H ’ Eψ6 )
... ... a6 0

Chemists, for historical reasons, call determinants of this form “secular determinants”,
but the matrices are the same as those created by the Galerkin method. Eq. (3.72) is a
standard matrix eigenvalue equation with E as the eigenvalue.
As Fig. 3.3 shows, heuristic guessing of basis functions which are not standard basis
functions can be very effective when coupled with the Rayleigh-Ritz principle and an ap-
propriate measure of desperation. This strategy is a good way of generating low-order
analytical approximations in other ¬elds, too, but it can be risky. With the Rayleigh-Ritz
principle, a bad guess is immediately evident: its lowest energy is larger than that of previ-
ous calculations. Victory is as clear and well-de¬ned as breaking a record in track-and-¬eld.
In other scienti¬c areas, this strategy has also been tried, but the risks are much higher.
In the absence of the Rayleigh-Ritz principle, guessing trial functions may make Galerkin
and collocation approximations worse instead of better (Finlayson, 1973).
Fortunately, it is possible to regain security of the Rayleigh-Ritz principle. The remedy
is to minimize the square of the residual as explained in Appendix G of the ¬rst edition of
this book (1989). The cost is that the programming is more complicated and the execution
time and storage are somewhat increased.

3.9 The Galerkin Method Today
In the next chapter, we will discuss pseudospectral/collocation algorithms, so it is a little
premature to compare different methods. However, a few preliminary remarks are in order.
First, the Galerkin method is more dif¬cult to program than pseudospectral algorithms.
The reason is simply that it is easier to evaluate a function than it is to integrate it.
It is often possible to replace the numerical integration by use of trigonometric identi-
ties, but this makes it more dif¬cult to test the program because a simple problem with a
known, exact solution may not use all the identities which are needed to solve more com-
plex equations (O™Connor, 1996). In contrast, a collocation program (or Galerkin method
with numerical evaluation of the matrix elements) does not use special tricks, and therefore
will either succeed or fail for both the test problem and the real one.
Second, the Galerkin method with a truncation of N is usually equal in accuracy to a
pseudospectral method with a truncation of N +1 or N +2. When N is large, this difference
is negligible, and pseudospectral algorithms are better. When N is small, however, that
extra one or two degrees of freedom can be very important.
We have already seen that spectral methods are often accurate even for N so small that
one can substitute paper-and-pencil for a computer. Thus, spectral methods are not merely
a tool for numerical analysis, but for theoretical analysis, too. For theoretical purposes, the
smallness of N usually implies that one needs that extra accuracy given by the Galerkin
method. Furthermore, if a problem is simple enough so that one can analytically perform
the linear algebra, one can usually analytically evaluate the integrals for the Galerkin ma-
trix elements, too. Finlayson (1973) is full of such examples, drawn mostly from engineer-
ing. A full discussion with programs in the algebraic manipulation languages REDUCE
and Maple is given in Chapter 20 and Boyd (1993).


. 19
( 136 .)