. 17
( 136 .)


tions have been rescaled so that

(φn , φn ) = 1

This implies that the coef¬cients of an arbitrary function f(x) are simply

an = (φn , f )

The property of orthonormality, as opposed to mere orthogonality, is much more use- √
ful for some basis functions than for others. Since the normalization factor is 1/ π for all
Fourier and Chebyshev terms (except the constant), it is usually not worth the bother to
“normalize” these basis sets. The simplest de¬nition of Hermite functions (integer coef-
¬cients) unfortunately leads to normalization factors which grow wildly with n, so it is
quite helpful to use orthonormalized Hermite functions. (“Wildly” is not an exaggeration;
I have sometimes had over¬‚ow problems with unnormalized Hermite functions!) From a
theoretical standpoint, it does not matter in the least whether the basis is orthonormal or
not so long as one is consistent.
Orthogonality, however, is always useful. It seems so special that one might think it
hard to obtain orthogonal sets of functions, but in fact it is quite easy. All the sets which
we shall use are the eigenfunctions of a class of eigenvalue equations known as “Sturm-
Liouville” problems. These two 19th century mathematicians proved that all problems of
this class generate complete, orthogonal basis sets.
Alternatively, we can choose a set of arbitrary functions. By applying a simple recursive
procedure called “Gram-Schmidt” orthogonalization, which is discussed in most linear al-
gebra texts, one can systematically ¬nd linear combinations of the chosen functions which
are mutually orthogonal. For example, the new basis functions de¬ned by (3.16) are not or-
thogonal, but can be made so by the Gram-Schmidt process. The orthogonal polynomials
in this book ” Chebyshev, Legendre, Hermite, and so on ” can be described as the re-
sult of Gram-Schmidt orthogonalization of the set of powers of x, {1, x, x2 , . . . }, on various
intervals with various weight functions.
Since the properties of Sturm-Liouville eigenfunctions are not important in applica-
tions, the theoretical details will be left to Morse and Feshbach(1953) and Carrier and
Pearson (1968). However, Sturm-Liouville theory is of enormous conceptual importance
because it is a general way of obtaining basis functions that are both complete and orthogo-
nal. Furthermore, the Sturm-Liouville methodology supplies the weight function ω(x) and
the interval [a, b] for each basis set as summarized in Appendix A.

One ¬nal comment on orthogonality: as noted earlier, it is important that the basis
functions be as different as possible; adding terms that can be closely approximated by a
sum of the other basis functions is a waste of time. Obviously, two functions that are very
similar will not be orthogonal, but instead will have an inner product little different from
the inner product of either one with itself.
Orthogonality guarantees that a set of basis functions will be as different from one an-
other as possible; it is a sort of maximization of linear independence.

3.4 Galerkin Method
The residual function R(x; a0 , a1 , . . . , aN ) can be expanded as a series of basis functions
like any other function of x,

R(x; a0 , a1 , . . . , aN ) = rn (a0 , a1 , . . . , aN ) φn (x)

where the coef¬cients are given by the usual inner product

rn = (R, φn )

The Galerkin method employs the “error distribution principle” that R(x) should be small
in the sense that the ¬rst (N + 1) terms of its spectral series are 0. The Fourier and Cheby-
shev series of smooth functions decrease exponentially fast with n, so all the rn for n > N
will presumably be very, very tiny if N is large enough. Thus, forcing the lower degree rn
to be 0 should make R(x) very, very small over the whole interval. In the limit that N ’ ∞,
R(x) must ’ 0 and therefore the approximation must converge exponentially fast to the
exact solution.
This strategy is identical to the Galerkin method as previously de¬ned since the “weighted
residual” conditions (φn , R) = 0 are the same as

rn = 0, n = 0, 1, 2, . . . , N [Galerkin method]

in virtue of (3.29).
Because the weighting functions are a complete set, (3.30) must force R(x) ’ 0 as N ’
∞. In contrast to the “method of moments”, Galerkin™s method is a very robust algorithm;
the “test” functions are orthogonal, and therefore are highly linearly independent.
When the basis set is not complete for arbitrary functions, but rather is restricted to ba-
sis functions that satisfy homogeneous boundary conditions, one might wonder whether
it is legitimate to limit the “test functions” to these modi¬ed basis functions, such as the
set de¬ned by (3.16). The answer is yes. The reason is that, at least in the limit N ’ ∞,
the residual function R(x) is converging towards the trivial function f (x) ≡ 0. Since this
limiting function and all its derivatives vanish at the endpoints (and everywhere else!), it
follows that f (x) ≡ 0 must satisfy any and all reasonable homogeneous boundary condi-
tions that we may have imposed upon the basis functions. Thus, any set of basis functions
that are complete for u(x)will also be complete for the residual function R(x).
For simplicity, we assume here and in the next section that the operator H is a linear
operator. (This is not as restrictive as it might seem as explained in Appendix C.) In that
case, the Galerkin conditions can be expressed in the form of a matrix equation. Let H
denote an (N + 1) — (N + 1) square matrix whose elements are given by (3.31), f be the

column matrix whose elements are given by (3.32), and a be the column vector whose
elements are the (N + 1) spectral coef¬cients:

Hij = (φi , Hφj ) i, j = 1, 2, . . . , (N + 1)

fi = (φi , f ) i = 1, 2, . . . , (N + 1)

Then the spectral coef¬cients an are the solution of the matrix equation

H a=f [Galerkin method]

The i-th row of (3.33) is the inner product of φi (x) with the residual function
R(x; a0 , a1 , . . . , aN ) ≡ ’f (x) + (3.34)
aj Hφj

If the basis functions do not individually satisfy the boundary conditions, then it is neces-
sary to replace some of the rows of (3.33) by equations that express the boundary condtions.
The endproduct is still a matrix equation like (3.33).
The ¬nal step is to solve (3.33) via standard linear algebra libraries such as LINPACK
and EISPACK. The user™s chore is simply to (i) calculate the matrix elements in (3.31) and
(3.32) and (ii) evaluate uN (x) by summing the spectral series. It™s that simple.

3.5 Weak & Strong Forms of Differential Equations: the Use-
fulness of Integration-by-Parts
The “strong” form of a differential equation is the one we all learned as undergraduates: a
relationship between derivatives that applies at each point on an interval, for example

uxx ’ q(x)u = ’f (x) (3.35)

In the mathematical and numerical literature, there are frequent references to the “weak”
form, which is based on the Galerkin method.
For simplicity, assume that the differential equation is subject to homogeneous Dirichlet
boundary conditions. (Quite general boundary conditions can be used without altering
what follows.) Then the “weak” form of Eq.( 3.35) is: For all “test” functions v(x) in the
appropriate Sobolev space, the solution of the “weak” form of the differential equation is
that function u(x) such

(v, uxx ’ q(x)u) = ’(v, f ) (3.36)

where the parentheses denote the usual integral inner product. “Appropriate” means that
the test functions must satisfy the boundary conditions.
The weak form is merely Galerkin™s method. In practical applications, the Sobolev
space would be restricted to polynomials up to and including degree N for both the test
functions and the solution.
The usefulness of the weak form arises from a trick: Integration-by-parts. If we multiply
by (-1), the weak form can be rewritten without approximation as

(vx , ux ) ’ {v(b)ux (b) ’ v(a)ux (a)} + (v, qu) = (v, f ) (3.37)

where x ∈ [a, b]. Because the “test” functions v(x) satisfy the homogeneous boundary
conditions, the endpoint terms in the braces are zero, simplifying the “weak” form to

(vx , ux ) + (v, qu) = (v, f )

The reason that this is useful is that although the original differential equation (in its
“strong” form) is second order, through integration-by-parts we have found an equivalent
statement which involves only ¬rst derivatives. This has several bene¬ts.
First, the “weak” form is meaningful even for problems with shock waves or other
pathologies. As long as the ¬rst derivative is integrable, even if it has a discontinuity at
some point on the interval, the “weak” form is well-de¬ned.
Second, the weak form widens the choice of basis functions. To solve the strong form
through collocation or the Galerkin method, we need basis functions that have non-zero
second derivatives, so piecewise-linear functions are excluded. Nevertheless, piecewise
linear or “tent” functions are very widely used in ¬nite elements and give second order
accuracy. “Tent” functions are possible only with the weak form.
The high-brow way of saying this is to assert that the appropriate Sobolev space is
H0 . The subscript “0” means that the elements of the Sobolev space satisfy homogeneous
Dirichlet boundary conditions, that is, v(a) = v(b) for all test and basis functions. The
superscript “1” means that the ¬rst derivatives of the functions in the space must be in the
space L2 , that is, the square of vx must be integrable on the interval x ∈ [a, b].
The third advantage is important only in multiple dimensions. To solve a problem on
an irregularly-shaped domain, one must make a change of coordinates to map the domain
from the physical coordinates (x, y) to a square domain in the computational coordinates
(r, s). One can then expand the solution in a “tensor product” basis, i. e., in basis functions
of the form Tm (r)Tn (s). Unfortunately, the change of coordinates alters the coef¬cients of
the differential equation by inserting “metric factors” that depend on the mapping. The
transformation of ¬rst derivatives is messy but not too bad. The transformation of second
derivatives is very messy. (Appendix E, Sec. 9.) The transformation of third and fourth
derivatives ” the programmer shudders and turns away in horror.
The weak form, however, needs only ¬rst derivatives even though the original differ-
ential equation is of second order. Similarly, a fourth order differential equation can be
represented in the weak form using no derivatives higher than second order. With the
weak form, it is no longer necessary to drown under a ¬‚ash¬‚ood of the metric factors of
high order derivatives.
This is important not only for preserving the sanity of the programmer. Complicated
metric factors also greatly increase the operation count and slow the computations.
The fourth advantage is that if the Sobolev space is restricted to the span of a ¬nite
number N of linearly independent basis functions, the Galerkin matrix for the weak form
is automatically symmetric:

Hij = (φx,i , φx,j ) + (φi , q(x)φj )

Because the test functions and basis functions both enter as the ¬rst derivative, interchang-
ing the indices i, j does not change the numerical value of the Galerkin matrix for the weak
The symmetry of the Galerkin matrix for the weak form has some important implica-
tions. First, it explains the relative unpopularity of the least squares method, which also
generates symmetric matrices. The Galerkin weak form is also symmetric but much sim-
Second, symmetric matrices can be solved by special methods which are much cheaper
than the more general algorithms that must be used for nonsymmetric matrices. For ex-

ample, a matrix equation with a symmetric matrix can be solved by a “Cholesky” decom-
position (Appendix B), which requires only half the ¬‚oating point operations and half the
memory storage of an LU factorization.
If the matrix equation is solved by an iterative scheme, one can use the “conjugate gra-
dient” method if the matrix is symmetric. Its unsymmetric counterpart is the “biconjugate
gradient” method which is roughly twice as expensive per step in both operations and
storage, and also more likely to converge very slowly or fail.
Matrix symmetry requires that the differential equation should be “self-adjoint”. For-
tunately, this is true of many important physical problems.
Lastly, the weak form can be connected with the calculus of variations to justify some
unconventional (but cheap and ef¬cient) schemes for patching solutions on separate sub-
domains together. Consequently, the spectral element method, which is the most popular
scheme for applying spectral methods with domain decomposition, almost always is based
on the “weak” form rather than collocation.
It turns out, however, that the integrals of the weak form are usually approximated
by a Gaussian quadrature scheme which is very closely related to interpolation and the
pseudospectral method as explained in the next chapter.

3.6 Galerkin Method: Case Studies

uxx ’ (1/2)u = ’(3/2) cos(x) ’ (9/2) cos(2x) (3.40)

with periodic boundary conditions. This is in the form Hu = f with

H ≡ ‚xx ’ (1/2); f (x) = ’(3/2) cos(x) ’ (9/2) cos(2x) (3.41)

Since the boundary conditions are periodic, Fourier functions are the obvious basis
set. We assert without proof that a Fourier cosine series (without the constant) is suf¬-
cient for this problem; note that only cosines appear on the R. H. S. [A full discussion is
given in Chap. 8 on parity and symmetry.] The matrix equivalent of (3.41) with the basis
cos(x), cos(2x) is

(cos[x], H cos[x]) (cos[x], H cos[2 x]) a1 (cos[x], f )
(cos[2x], H cos[x]) (cos[2x], H cos[2 x]) a2 (cos[2 x], f )

(g, h) ≡ (3.42)
dx g(x) h(x)

for any two functions g(x) and h(x).
To evaluate the matrix elements, note that

H cos(nx) ≡ [cos(nx)]xx ’ (1/2) cos(nx), for all n
= ’{n2 + 1/2} cos(nx), for all n (3.43)


all (3.44)
(cos[mx], cos[nx]) = πδmn , m, n > 0

Thus, (3.6) becomes

(’3/2) 0 a1 (’3/2)


. 17
( 136 .)