early with N . However, when the solution has suf¬cient small-scale variability, large N is

needed even for moderate accuracy. Because of the factor-of-N “transform penalty”, the

pseudospectral method must invariably lose the competition for ef¬ciency at a moderate

error tolerance when the required N is suf¬ciently large.

However, there is an escape: if the basis functions are trigonometric functions, Cheby-

shev polynomials, or rational Chebyshev functions, then the “transform penalty” can be

lowered still further by performing the one-dimensional transforms via the Fast Fourier

Transform. This reduces the “transform penalty” for FFT-able functions to log2 N . Since

the logarithm grows very slowly with N , Chebyshev and Fourier pseudospectral methods

easily defeat their ¬nite difference brethren in the war of ef¬ciency.

To avoid drowning in formulas, we have only discussed the spectral-to-grid transform

in this section. However, exactly the same trick can be applied in the opposite direction to

calculate the coef¬cients from f (xi , yj ).

10.3 The Fast Fourier Transform: Theory

The FFT is restricted to Fourier and Chebyshev functions, but it is extremely powerful

since it removes almost all of the “transform penalty” discussed in the previous section.

The standard FFT has a cost of (assuming N is a power of 2)

N-pt. complex FFT ∼ 5N log2 [N ] total real operations (10.13)

where “total operations” includes both multiplications and additions. To sum the same

one-dimensional series by direct summation requires the multiplication of a column vector

(of coef¬cients or grid point values) by a square matrix. The cost of this Matrix Multiplica-

tion Transform (MMT) is

N-pt. complex matrix transform ∼ 8N 2 real operations (10.14)

Costs will later be analyzed more carefully, but it is obvious that at least for large N ,

the FFT is much faster than the MMT. In addition, the FFT both vectorizes well (for Cray

supercomputers) and parallelizes well (for massively parallel machines like the IBM SP2).

Therefore, its cost-savings will only become larger as the relentless march to larger and

larger N continues.

All software libraries offer a variety of canned FFT routines, so it is never necessary to

write an FFT code from scratch. For example, Numerical Recipes (Press et al., 1986) gives a

complete listing of an FFT code in FORTRAN, C, or Pascal, depending on the edition. Both

CHAPTER 10. PARTIAL SUMMATION, THE FFT AND MMT

188

one-dimensional and two-dimensional FFTs are built-in commands in MATLAB. And so

it goes. Nevertheless, it is useful to understand the theoretical basis of the FFT, and to

appreciate its similarities to multi-dimensional partial summation

De¬ne the discrete transform as

N

2 πj

xk exp ’i k (10.15)

Xj = j = 1, . . . , N

N

k=1

and the inverse by

N

1 2 πj

(10.16)

xk = Xj exp i k k = 1, . . . , N

N N

j=1

(The notation and treatment follow unpublished notes by P. N. Swarztrauber; see Swarz-

trauber(1986).) What is striking about (10.15) is that it is not in the form of a standard

complex Fourier series for which the sum variable would run over both the positive and

negative integers as in

N/2’1

2 πj

am exp ’i m j = N/2, . . . , N/2 ’ 1 (10.17)

uj =

N

m=’N/2

One reason for this odd convention is that if we introduce the parameter

2π

w ≡ exp ’i (10.18)

,

N

we can rewrite (10.15) as a power series in w:

N

xk wjk (10.19)

Xj = j = 1, . . . , N

k=1

Now the key to our success with multi-dimensional sums was that the three-dimensional

basis functions could be factored into the product of three one-dimensional functions. The

key to the FFT is that powers of w can be factored, too.

For simplicity, suppose that N is even. We can then split {xk } into two sequences of

length N/2 which consist of the even and odd parts, respectively:

yk ≡ x2k zk ≡ x2k’1 (10.20)

; k = 1, . . . , N/2

These have their own transforms which are

N/2 N/2

Yj ≡ Zj ≡

2jk

zk w2jk (10.21)

yk w ; j = 1, . . . , N/2

k=1 k=1

If we could somehow deduce the transform of the original sequence from {Yj } and {Zj },

we would save a factor of 2; when evaluated by matrix multiplication, the cost of a trans-

form of length N is N 2 operations, so the two half-transforms (10.21) cost N 2 /4 + N 2 /4 =

N 2 /2.

We can retrieve the transform of the original sequence Xj by segregating the even and

odd terms in (10.19) and then applying the de¬nitions of the half transform (10.21):

N/2

x2k wj(2k) + x2k’1 wj(2k’1) (10.22)

Xj =

k=1

= Yj + w’j Zj (10.23)

j = 1, . . . , N/2

10.3. THE FAST FOURIER TRANSFORM: THEORY 189

Table 10.2: A Selected Bibliography on the Fast Fourier Transform

References Comments

Carse&Urquhart (1914) [Historical] Early but ef¬cient FFTs (“Runge grouping”);

printed computing forms used in place of digital computers

Deville&Labrosse(1982) FORTRAN listings and a little theory for Chebyshev FFT

in one, two, and three space dimensions

Brachet et al.(1983) Ef¬cient odd cosine transform

Temperton (1983a) Explains why inclusion of factors of 4 and 6 in factorization

of N reduces cost by 15% for N = 64

also shows simultaneous computation of many transforms

is highly vectorizable and parallelizable

Temperton (1983b) Fast real-valued transforms

Temperton (1985, 1992) General prime-number factorization of N

Press et al.(1986) Program listing for FFT code: C, FORTRAN or Pascal

(Numerical Recipes)

Swarztrauber (1986) Ef¬cient cosine (Chebyshev) and sine transforms

Canuto et al.(1988) Appendix B with detailed theory and

FORTRAN programs for cosine, Chebyshev transforms

Swarztrauber (1987) Parallel FFT

van Loan (1992) Highly-regarded monograph on FFT

Swarztrauber (1993) Vector spherical harmonics; also cosine & sine transforms

Pelz (1993) Parallel FFTs for real data

Briggs&Henson(1995) Excellent, readable book on the FFT and its applications

Fornberg(1996) Careful discussion of Chebyshev transform, etc.

There is one minor snag: Yj and Zj are only de¬ned for j ¤ N/2. However,

w’j’N/2 = ’w’j ,

which implies

Xj+N/2 = Yj ’ w’j Zj (10.24)

j = 1, . . . , N/2

Together, (10.22) and (10.24) gives Xj for j = 1, . . . , N from Yj and Zj which are de¬ned

only for j ¤ N/2.

Thus, we can indeed save a factor of two (for N even) by splitting the transform into

two parts and computing the partial sums of each. The trick is the same as for the two-

dimensional sums of the previous section if the “y” coordinate is imagined as having only

two degrees of freedom.

If N = 2M , then we can repeat this factorizing trick M times (until the Zj and Yj are

trivial transforms of length 1), and save roughly a factor of 2 at each step. (A more careful

count reveals the log2 (N ) factor in (10.13).)

We can modify this factorizing trick by separating the original sequence into three parts

of length N/3, or ¬ve parts of length N/5, or seven parts of length N/7, etc., if N is divisible

by 3, 5, or 7, respectively. The most general library subroutines will factor an arbitrary

N into primes and then subdivide the sums appropriately. The factoring of N requires

additional overhead, and the splitting of the transform saves less and less time as the prime

factors become larger. Consequently, most hydrodynamics codes choose N to be a power

of 2, or a power of 2 multiplied by a single factor of 3.

The articles and books in Table 10.2 give additional theory and advice.

CHAPTER 10. PARTIAL SUMMATION, THE FFT AND MMT

190

10.4 Matrix Multiplication Transform (MMT)

and Matrix Multiplication Transform

with Parity (PMMT)

In the grid-point-to-spectral-coef¬cients direction, the Matrix Multiplication Transform (MMT)

is simply Gaussian quadrature. For example, a four-point transform would be

a0 w1 φ0 (x1 ) w2 φ0 (x2 ) w3 φ0 (x3 ) w4 φ0 (x4 ) f1

a1 = w1 φ1 (x1 ) w2 φ1 (x2 ) w3 φ1 (x3 ) w4 φ1 (x4 ) f2

a2 w1 φ2 (x1 ) w2 φ2 (x2 ) w3 φ2 (x3 ) w4 φ2 (x4 ) f3

a3 w1 φ3 (x1 ) w2 φ3 (x2 ) w3 φ3 (x3 ) w4 φ3 (x4 ) f4

where the wj are the Gaussian quadrature weights multiplied by normalization factors and

the φj (x) are the basis functions, which could be Fourier, Chebyshev, Hermite, Associated

Legendre, etc. (The normalization factors are chosen so the square matrix above is the

inverse of the square matrix below, i. e., such that aj = 1, all other coef¬cients zero when

f (x) = φj (x); numerical values are given in the Appendices.)

The backward, spectral-to-grid-point MMT is merely summation of the interpolant,

such as

f1 φ0 (x1 ) φ1 (x1 ) φ2 (x1 ) φ3 (x1 ) a1

f2 = φ0 (x2 ) φ1 (x2 ) φ2 (x2 ) φ3 (x2 ) a2

f3 φ0 (x3 ) φ1 (x3 ) φ2 (x3 ) φ3 (x3 ) a3

f4 φ0 (x4 ) φ1 (x4 ) φ2 (x4 ) φ3 (x4 ) a4

The cost of an MMT is that of a single matrix-vector multiply: 8N 2 real operations for

complex-valued matrices and vector and 2N 2 operations for real-valued matrices and vec-

tors. However, these cost estimates are a bit misleading. Vector supercomputers and RISC

workstations have the hardware to perform matrix-vector multiplies very ef¬ciently. Fur-

thermore, because the matrix-vector multiply is a key building block of many algorithms,

assembly-langauge routines to perform it are available on many machines, which roughly

halves the execution time.

The cost of the MMT can be halved for any any basis set whose elements have de¬nite

parity to give the faster PMMT algorithm.

Theorem 28 (PARITY MATRIX MULTIPLICATION (PMMT)) If all the elements of a basis

set have de¬nite parity, that is, are all either symmetric or antisymmetric with respect to x = 0, then

one may halve the cost of a matrix multiplication transform from grid space to spectral coef¬cients

or vice versa by using the following algorithm.

Suppose that the even degree basis functions are symmetric while the odd degree are antisym-

metric with respect to x = 0. Sum the series

N/2 N/2

(10.25)

fS (xi ) = a2n φ2n (xi ) & fA (xi ) = a2n+1 φ2n+1 (xi )

n=0 n=0

at half of the grid points: all those for positive x, for example. The sum of both halves of the series at

all points of the grid is then given by

fS (xi ) ’ fA (xi ) i = 1, . . . , N/2

(10.26)

f (xi ) =

fS (xi ) + fA (xi ) i = (N/2) + 1, . . . , N

10.4. MATRIX MULTIPLICATION TRANSFORM 191

fS= f4 (f3+f5) (f2+f6) (f1+f7) f 8

2 2

2

x=π/4 x=π/2 x=3π/4 x=π

x=0

x=-π/4 x=-π/2 x=-3π/4

f A= (f5-f3) (f6-f2) (f7-f1)

2 2

2

Figure 10.1: Schematic of how the grid point values of a general function f (x) can be

decomposed into grid point values of its symmetric part fS (x) and its antisymmetric

part fA . In the case illustrated, the eight-point interpolant of f by a weighted sum of

{1, cos(x), cos(2x), cos(3x), cos(4x), sin(x), sin(2x), sin(3x)} is replaced by the ¬ve point in-

terpolant of fS (x) by {1, cos(x), cos(2x), cos(3x), cos(4x)} and the three point interpolant of

fA (x) by {sin(x), sin(2x), sin(3x)}.

The cost is only O(N 2 /2) multiplications plus an equal number of additions (ignoring terms linear

in N ). This is half the cost of the standard N -point transform because we have split the problem

into two transforms of length N/2.

This “parity transform” is applicable to Legendre polynomials, Hermite functions, rational

Chebyshev function (T Bn (x)) on the in¬nite interval and all other basis sets whose elements are all

either even or odd with respect to x = 0. It is not applicable to Laguerre functions or the rational

Chebyshev functions on the semi-in¬nite interval, T Ln (x).

PROOF: As noted in Chapter 2, an arbitrary function f (x) may always be split into its

symmetric and antisymmetric parts, fS (x) and fA (x), respectively. fS (x) is the sum of the

even degree basis functions only, so it may be computed by summing only (N/2) terms,

and similarly for the antisymmetric component.

Fig. 10.1 illustrates the PMMT in the opposite direction, grid point values {fj } to spec-

tral coef¬cients. The N = 4 general MMTs given above simplify to

a0 w3 φ0 (x3 ) w4 φ0 (x4 ) f2 + f3

a2 = w3 φ2 (x3 ) w4 φ2 (x4 ) f1 + f4

and

f3 ’ f2

a1 w3 φ1 (x3 ) w4 φ1 (x4 )

f4 ’ f1

a3 = w3 φ3 (x3 ) w4 φ3 (x4 )

CHAPTER 10. PARTIAL SUMMATION, THE FFT AND MMT

192