<<

. 41
( 136 .)



>>

1/2 to 1/5 as explained in Chapter 2, Sec. 14] whereas the cost per time step grows lin-
early with N . However, when the solution has suf¬cient small-scale variability, large N is
needed even for moderate accuracy. Because of the factor-of-N “transform penalty”, the
pseudospectral method must invariably lose the competition for ef¬ciency at a moderate
error tolerance when the required N is suf¬ciently large.
However, there is an escape: if the basis functions are trigonometric functions, Cheby-
shev polynomials, or rational Chebyshev functions, then the “transform penalty” can be
lowered still further by performing the one-dimensional transforms via the Fast Fourier
Transform. This reduces the “transform penalty” for FFT-able functions to log2 N . Since
the logarithm grows very slowly with N , Chebyshev and Fourier pseudospectral methods
easily defeat their ¬nite difference brethren in the war of ef¬ciency.
To avoid drowning in formulas, we have only discussed the spectral-to-grid transform
in this section. However, exactly the same trick can be applied in the opposite direction to
calculate the coef¬cients from f (xi , yj ).


10.3 The Fast Fourier Transform: Theory
The FFT is restricted to Fourier and Chebyshev functions, but it is extremely powerful
since it removes almost all of the “transform penalty” discussed in the previous section.
The standard FFT has a cost of (assuming N is a power of 2)

N-pt. complex FFT ∼ 5N log2 [N ] total real operations (10.13)

where “total operations” includes both multiplications and additions. To sum the same
one-dimensional series by direct summation requires the multiplication of a column vector
(of coef¬cients or grid point values) by a square matrix. The cost of this Matrix Multiplica-
tion Transform (MMT) is

N-pt. complex matrix transform ∼ 8N 2 real operations (10.14)

Costs will later be analyzed more carefully, but it is obvious that at least for large N ,
the FFT is much faster than the MMT. In addition, the FFT both vectorizes well (for Cray
supercomputers) and parallelizes well (for massively parallel machines like the IBM SP2).
Therefore, its cost-savings will only become larger as the relentless march to larger and
larger N continues.
All software libraries offer a variety of canned FFT routines, so it is never necessary to
write an FFT code from scratch. For example, Numerical Recipes (Press et al., 1986) gives a
complete listing of an FFT code in FORTRAN, C, or Pascal, depending on the edition. Both
CHAPTER 10. PARTIAL SUMMATION, THE FFT AND MMT
188

one-dimensional and two-dimensional FFTs are built-in commands in MATLAB. And so
it goes. Nevertheless, it is useful to understand the theoretical basis of the FFT, and to
appreciate its similarities to multi-dimensional partial summation
De¬ne the discrete transform as
N
2 πj
xk exp ’i k (10.15)
Xj = j = 1, . . . , N
N
k=1

and the inverse by
N
1 2 πj
(10.16)
xk = Xj exp i k k = 1, . . . , N
N N
j=1

(The notation and treatment follow unpublished notes by P. N. Swarztrauber; see Swarz-
trauber(1986).) What is striking about (10.15) is that it is not in the form of a standard
complex Fourier series for which the sum variable would run over both the positive and
negative integers as in
N/2’1
2 πj
am exp ’i m j = N/2, . . . , N/2 ’ 1 (10.17)
uj =
N
m=’N/2

One reason for this odd convention is that if we introduce the parameter

w ≡ exp ’i (10.18)
,
N
we can rewrite (10.15) as a power series in w:
N
xk wjk (10.19)
Xj = j = 1, . . . , N
k=1

Now the key to our success with multi-dimensional sums was that the three-dimensional
basis functions could be factored into the product of three one-dimensional functions. The
key to the FFT is that powers of w can be factored, too.
For simplicity, suppose that N is even. We can then split {xk } into two sequences of
length N/2 which consist of the even and odd parts, respectively:
yk ≡ x2k zk ≡ x2k’1 (10.20)
; k = 1, . . . , N/2
These have their own transforms which are
N/2 N/2
Yj ≡ Zj ≡
2jk
zk w2jk (10.21)
yk w ; j = 1, . . . , N/2
k=1 k=1

If we could somehow deduce the transform of the original sequence from {Yj } and {Zj },
we would save a factor of 2; when evaluated by matrix multiplication, the cost of a trans-
form of length N is N 2 operations, so the two half-transforms (10.21) cost N 2 /4 + N 2 /4 =
N 2 /2.
We can retrieve the transform of the original sequence Xj by segregating the even and
odd terms in (10.19) and then applying the de¬nitions of the half transform (10.21):
N/2
x2k wj(2k) + x2k’1 wj(2k’1) (10.22)
Xj =
k=1
= Yj + w’j Zj (10.23)
j = 1, . . . , N/2
10.3. THE FAST FOURIER TRANSFORM: THEORY 189


Table 10.2: A Selected Bibliography on the Fast Fourier Transform

References Comments
Carse&Urquhart (1914) [Historical] Early but ef¬cient FFTs (“Runge grouping”);
printed computing forms used in place of digital computers
Deville&Labrosse(1982) FORTRAN listings and a little theory for Chebyshev FFT
in one, two, and three space dimensions
Brachet et al.(1983) Ef¬cient odd cosine transform
Temperton (1983a) Explains why inclusion of factors of 4 and 6 in factorization
of N reduces cost by 15% for N = 64
also shows simultaneous computation of many transforms
is highly vectorizable and parallelizable
Temperton (1983b) Fast real-valued transforms
Temperton (1985, 1992) General prime-number factorization of N
Press et al.(1986) Program listing for FFT code: C, FORTRAN or Pascal
(Numerical Recipes)
Swarztrauber (1986) Ef¬cient cosine (Chebyshev) and sine transforms
Canuto et al.(1988) Appendix B with detailed theory and
FORTRAN programs for cosine, Chebyshev transforms
Swarztrauber (1987) Parallel FFT
van Loan (1992) Highly-regarded monograph on FFT
Swarztrauber (1993) Vector spherical harmonics; also cosine & sine transforms
Pelz (1993) Parallel FFTs for real data
Briggs&Henson(1995) Excellent, readable book on the FFT and its applications
Fornberg(1996) Careful discussion of Chebyshev transform, etc.




There is one minor snag: Yj and Zj are only de¬ned for j ¤ N/2. However,

w’j’N/2 = ’w’j ,

which implies

Xj+N/2 = Yj ’ w’j Zj (10.24)
j = 1, . . . , N/2

Together, (10.22) and (10.24) gives Xj for j = 1, . . . , N from Yj and Zj which are de¬ned
only for j ¤ N/2.
Thus, we can indeed save a factor of two (for N even) by splitting the transform into
two parts and computing the partial sums of each. The trick is the same as for the two-
dimensional sums of the previous section if the “y” coordinate is imagined as having only
two degrees of freedom.
If N = 2M , then we can repeat this factorizing trick M times (until the Zj and Yj are
trivial transforms of length 1), and save roughly a factor of 2 at each step. (A more careful
count reveals the log2 (N ) factor in (10.13).)
We can modify this factorizing trick by separating the original sequence into three parts
of length N/3, or ¬ve parts of length N/5, or seven parts of length N/7, etc., if N is divisible
by 3, 5, or 7, respectively. The most general library subroutines will factor an arbitrary
N into primes and then subdivide the sums appropriately. The factoring of N requires
additional overhead, and the splitting of the transform saves less and less time as the prime
factors become larger. Consequently, most hydrodynamics codes choose N to be a power
of 2, or a power of 2 multiplied by a single factor of 3.
The articles and books in Table 10.2 give additional theory and advice.
CHAPTER 10. PARTIAL SUMMATION, THE FFT AND MMT
190

10.4 Matrix Multiplication Transform (MMT)
and Matrix Multiplication Transform
with Parity (PMMT)
In the grid-point-to-spectral-coef¬cients direction, the Matrix Multiplication Transform (MMT)
is simply Gaussian quadrature. For example, a four-point transform would be

a0 w1 φ0 (x1 ) w2 φ0 (x2 ) w3 φ0 (x3 ) w4 φ0 (x4 ) f1
a1 = w1 φ1 (x1 ) w2 φ1 (x2 ) w3 φ1 (x3 ) w4 φ1 (x4 ) f2
a2 w1 φ2 (x1 ) w2 φ2 (x2 ) w3 φ2 (x3 ) w4 φ2 (x4 ) f3
a3 w1 φ3 (x1 ) w2 φ3 (x2 ) w3 φ3 (x3 ) w4 φ3 (x4 ) f4

where the wj are the Gaussian quadrature weights multiplied by normalization factors and
the φj (x) are the basis functions, which could be Fourier, Chebyshev, Hermite, Associated
Legendre, etc. (The normalization factors are chosen so the square matrix above is the
inverse of the square matrix below, i. e., such that aj = 1, all other coef¬cients zero when
f (x) = φj (x); numerical values are given in the Appendices.)
The backward, spectral-to-grid-point MMT is merely summation of the interpolant,
such as
f1 φ0 (x1 ) φ1 (x1 ) φ2 (x1 ) φ3 (x1 ) a1
f2 = φ0 (x2 ) φ1 (x2 ) φ2 (x2 ) φ3 (x2 ) a2
f3 φ0 (x3 ) φ1 (x3 ) φ2 (x3 ) φ3 (x3 ) a3
f4 φ0 (x4 ) φ1 (x4 ) φ2 (x4 ) φ3 (x4 ) a4

The cost of an MMT is that of a single matrix-vector multiply: 8N 2 real operations for
complex-valued matrices and vector and 2N 2 operations for real-valued matrices and vec-
tors. However, these cost estimates are a bit misleading. Vector supercomputers and RISC
workstations have the hardware to perform matrix-vector multiplies very ef¬ciently. Fur-
thermore, because the matrix-vector multiply is a key building block of many algorithms,
assembly-langauge routines to perform it are available on many machines, which roughly
halves the execution time.
The cost of the MMT can be halved for any any basis set whose elements have de¬nite
parity to give the faster PMMT algorithm.


Theorem 28 (PARITY MATRIX MULTIPLICATION (PMMT)) If all the elements of a basis
set have de¬nite parity, that is, are all either symmetric or antisymmetric with respect to x = 0, then
one may halve the cost of a matrix multiplication transform from grid space to spectral coef¬cients
or vice versa by using the following algorithm.
Suppose that the even degree basis functions are symmetric while the odd degree are antisym-
metric with respect to x = 0. Sum the series
N/2 N/2
(10.25)
fS (xi ) = a2n φ2n (xi ) & fA (xi ) = a2n+1 φ2n+1 (xi )
n=0 n=0

at half of the grid points: all those for positive x, for example. The sum of both halves of the series at
all points of the grid is then given by

fS (xi ) ’ fA (xi ) i = 1, . . . , N/2
(10.26)
f (xi ) =
fS (xi ) + fA (xi ) i = (N/2) + 1, . . . , N
10.4. MATRIX MULTIPLICATION TRANSFORM 191


fS= f4 (f3+f5) (f2+f6) (f1+f7) f 8
2 2
2
x=π/4 x=π/2 x=3π/4 x=π
x=0

x=-π/4 x=-π/2 x=-3π/4
f A= (f5-f3) (f6-f2) (f7-f1)
2 2
2
Figure 10.1: Schematic of how the grid point values of a general function f (x) can be
decomposed into grid point values of its symmetric part fS (x) and its antisymmetric
part fA . In the case illustrated, the eight-point interpolant of f by a weighted sum of
{1, cos(x), cos(2x), cos(3x), cos(4x), sin(x), sin(2x), sin(3x)} is replaced by the ¬ve point in-
terpolant of fS (x) by {1, cos(x), cos(2x), cos(3x), cos(4x)} and the three point interpolant of
fA (x) by {sin(x), sin(2x), sin(3x)}.


The cost is only O(N 2 /2) multiplications plus an equal number of additions (ignoring terms linear
in N ). This is half the cost of the standard N -point transform because we have split the problem
into two transforms of length N/2.
This “parity transform” is applicable to Legendre polynomials, Hermite functions, rational
Chebyshev function (T Bn (x)) on the in¬nite interval and all other basis sets whose elements are all
either even or odd with respect to x = 0. It is not applicable to Laguerre functions or the rational
Chebyshev functions on the semi-in¬nite interval, T Ln (x).

PROOF: As noted in Chapter 2, an arbitrary function f (x) may always be split into its
symmetric and antisymmetric parts, fS (x) and fA (x), respectively. fS (x) is the sum of the
even degree basis functions only, so it may be computed by summing only (N/2) terms,
and similarly for the antisymmetric component.
Fig. 10.1 illustrates the PMMT in the opposite direction, grid point values {fj } to spec-
tral coef¬cients. The N = 4 general MMTs given above simplify to

a0 w3 φ0 (x3 ) w4 φ0 (x4 ) f2 + f3
a2 = w3 φ2 (x3 ) w4 φ2 (x4 ) f1 + f4

and

f3 ’ f2
a1 w3 φ1 (x3 ) w4 φ1 (x4 )
f4 ’ f1
a3 = w3 φ3 (x3 ) w4 φ3 (x4 )
CHAPTER 10. PARTIAL SUMMATION, THE FFT AND MMT
192

<<

. 41
( 136 .)



>>