vection, but it is greatly weakened, and the inherent dissipation is suf¬cient to suppress

aliasing instability. Nevertheless, we shall describe de-aliased models in what follows.

With a total of 2N + 1 degrees of freedom in longitude, we need at least 2N + 1 grid

points in », but it is customary to use 3N + 1 points so as to eliminate aliasing according

to the 3/2™s rule. In colatitude, each spherical harmonic in a triangular truncation is a

polynomial in cos(θ) [or such a polynomial multiplied by sin(θ)] of degree at most N . For

m = ±N , the degree of the polynomial is furnished entirely by the sin|m| (θ) factor; for

m = 0, it is the degree of the Gegenbauer polynomial, but for all m, the degree is at most N .

In expanding the quadratically nonlinear terms, we must evaluate integrals of the product

of three harmonics, so the total degree of the integrands we must perform is never larger

than 3N . It follows that we need a minimum of (3/2)N grid points in colatitude to evaluate

these integrals exactly and thus again avoid aliasing.

18.12.2 Substitutes and Accelerators for the MMT

Spherical harmonics have one great fault: slow transforms for Associated Legendre func-

tions in latitude. This vice has has inspired a quest for alternatives which has lasted more

than three decades and counting. In longitude, the spherical harmonics are just a Fourier

series. All grid-to-coef¬cient and coef¬cient-to-grid transforms for spherical harmonics

(and alternatives to them) employ a Fourier series in longitude and the Fast Fourier Trans-

form in this coordinate. It is only the transform in latitude which is troublesome because

the FFT is inapplicable to Associated Legendre series.

The reason that the Legendre transform is important is that spectral algorithms require

repeated jumps from grid point values to coef¬cients and back again at each time step. At

the end of the twentieth century, spherical harmonic weather forecasting models use the

Matrix Multiplication Transform (MMT) to jump from Legendre coef¬cients to grid point

values and back again. For each zonal wavenumber m, this multiplication of a vector by a

dense matrix requires O(N 2 ) operations. It follows that the cost is very large compared to

the O(N log2 (N )) expense of an FFT. For many years, there has been a feeling that eventu-

CHAPTER 18. SPHERICAL & CYLINDRICAL GEOMETRY

404

ally N would grow so large that spherical harmonics models would become prohibitively

expensive, forcing a switch to ¬nite elements or other non-spectral algorithms.

Reality is different from expectation. The highest resolution forecasting code at the time

of writing (1998) is the T639 model of the European Centre for Medium-Range Weather

Forecasting (ECMWF), which is being tested but will not be operational for another couple

of years. Although N = 639, the Legendre transforms still consume only 19% of the run-

ning time of the model. How is this possible? The short answer is that even though there is

no algorithm as nifty as the FFT for Legendre series, there are several tricks for accelerating

the MMT which have so far saved spherical harmonics from obsolesence:

1. Parity Matrix Multiplication Transform (PMMT)

2. The Vectorizability and Parallelizability of Matrix Multiplication

3. Reduced Polar Grid

4. Complicated Physics and Chemistry in Weather Forecasting Models

5. Schuster-Dilts Triangular Matrix Acceleration

6. Generalized Fast Fourier Transforms

18.12.3 Parity and Legendre Transforms

Because the Associated Legendre functions are all either symmetric or antisymmetric with

respect to the equator, a transformation from grid point values to N spectral coef¬cients

(or the reverse) can always be split into two subproblems of size N/2. Thus, after ¬rst

applying the FFT to transform from grid point values in latitude and longtitude to the grid

point values of zonal wavenumber m, a set of grid point values fm (θ) can be combined into

components symmetric and antisymmetric with respect to the equator (where colatitude

(θ = latitude ’ π/2) by taking sums and differences:

1 1

fm (θ) ≡ (fm (θ) + fm (π ’ θ)) , fm (θ) ≡ (fm (θ) ’ fm (π ’ θ)) , θ ∈ [0, π/2]

S A

(18.39)

2 2

Each collection of grid point values of de¬nite parity can then be transformed by a matrix

multiplication into the corresponding even or odd Legendre coef¬cients. Each rectangular

transformation matrix has only half the rows and half the columns of the original, non-

parity-exploiting transformation, but there are two smaller transformations to be taken.

If we ignore the O(N ) cost of taking sums and differences compared to the O(N 2 ) of the

matrix-vector multiplications of the transformations themselves, parity saves exactly a fac-

tor of two. Symmetry is discussed in greater detail in Chapter 8.

18.12.4 Hurrah for Matrix/Vector Multiplication

The multiplication of a vector by a matrix is one of the most fundamental operations of

linear algebra. Because of this, it is an operation which is often coded as a blazingly fast

assembly language program in software libraries. An assembly language routine will typ-

ically speed up the matrix/vector multiplication by a factor of two.

Furthermore, supercomputers with vector-processing hardware, such as the Cray 1 and

its arithmurgical descendants, process the matrix/vector multiplication very ef¬ciently.

Indeed, it is not an exaggeration to say that the vector hardware is built primarily to do

this one operation as ef¬ciently as possible. Because this operation requires no conditional

statements, branching, or any other overhead, Legendre transforms can be computed on

18.12. LEGENDRE TRANSFORMS AND OTHER SORROWS 405

vector hardware at very close to the machine™s optimum ¬‚oating point execution rate. In

contrast, the sparse matrix operations of ¬nite differences require a lot of starting-and-

stopping: one cannot exploit the holes in a sparse matrix without working very hard to

inform the hardware of where the holes are. Because of this, when ¬nite difference and

spherical harmonics codes are compared in terms of ¬‚ops (that is, ¬‚oating point operations

per second), the spherical harmonics program usually has a much higher ¬‚op rate, much

closer to the machine™s theoretical optimum, than a ¬nite difference code. Thus, although

the spherical harmonic code may require a much larger number of ¬‚oating point operations

for a given N than a ¬nite difference or ¬nite element code, the wall-clock “grind time” may

be about the same.

Similarly, the multiplication of a dense matrix by a vector is a speedy process on a mas-

sively parallel machine. A matrix/vector multiply is actually just a set of independent

vector-vector multiplications, one for each row. Furthermore, the MMT matrix elements

are computed in a preprocessing step. It follows that on a massively parallel machine, it is

easy to subdivide the matrix/vector multiply among processors by assigning one or more

rows to each processor. To initiate the parallel computation, it is only necessary to broad-

cast the vector of grid point values to each processor simultaneously, and all ¬‚oating point

operations can then be executed in parallel. Again, because the matrix/vector multiplica-

tion is the fundamental operation of linear algebra, there are very good assembly language

routines and carefully designed algorithms for each type of hardware.

The conclusion is that in high performance computing, Legendre transforms-by-matrix

are much more ef¬cient and less costly than operation counts would suggest. In a computer

science course, one may only be asked to add up multiplications and additions, but in the

real world, it is important to look at the clock.

18.12.5 Reduced Grid and Other Tricks

Hortal and Simmons(1991) and Courtier and Naughton (1994) showed that one could re-

duce the cost of a spectral model by systematically deleting points from the tensor product

grid near the poles with little loss of accuracy. The reductions are not large (perhaps 30%),

but to paraphrase the late U. S. Senator Dirksen: “A billion multiplications here, a billion

additions there ” pretty soon you™re talking about real savings.”

Swarztrauber(1996) has carefully compared a large number of different formulations of

spherical harmonic methods to show that by proper organization, one can minimize the

number of Legendre transforms that are needed on each step of the algorithm.

18.12.6 Schuster-Dilts Triangular Matrix Acceleration

Schuster(1903) discovered that the dense matrix/vector multiplication of the usual Gaussian-

quadrature PMMT could be replaced by multiplication by a triangular matrix to reduce the

cost by a factor of two. Dilts(1985) independently reinvented this technique. The ¬rst part

of the algorithm is a two-dimensional FFT to transform from a grid which is evenly spaced

in both latitude and longtitude to the coef¬cients of a two-dimensional Fourier series. The

second step is to multiply the Fourier coef¬cients by an upper triangular matrix to convert

the Fourier coef¬cients to spherical harmonic coef¬cients.

Because of the familiar identity Tj (cos(θ)) ≡ cos(jθ), the Fourier series in latitude can

be equally well be regarded as a Chebyshev series in the variable x ≡ cos(θ) where θ is

colatitude. This interpretation is conceptually useful because it shows that the function

of the triangular matrix is to convert from one polynomial basis (Chebyshev) to another

polynomial basis (associated Legendre). These conversion is done one zonal wavenumber

CHAPTER 18. SPHERICAL & CYLINDRICAL GEOMETRY

406

at a time. To simplify the discussion, we shall illustrate only the case of zonal wavenumber

m = 0 (ordinary Legendre polynomials).

One important point is that the transformation respects parity. Thus, the even degree

Fourier coef¬cients in latitude transform only to even degree spherical harmonics, and

similarly odd transforms only to odd.

First, split the the θ-dependent coef¬cient of the m = 0 Fourier series into components

symmetric and antisymmetric with respect to the equator as in Eq.(18.39). Let the symmet-

ric part have the Chebyshev polynomial expansion

3

S

(18.40)

fm (x) = b2j T2j (x)

j=0

= b0 + b2 (2x2 ’ 1) + b4 (8x4 ’ 8x2 + 1) + b6 (32x4 ’ 48x4 + 18x2 ’ 1)

which is equivalent to a cosine series in θ where x = cos(θ) where θ is colatitude. (For

simplicity, we take only four terms, but the method can be extended to arbitrary order.)

Our goal is to convert this to the corresponding Legendre series:

3

S

fm (x) = a2j P2j (x)

j=0

32 1 35 4 15 2 3

x’ x’ x+ (18.41)

= a0 + a2 + a4

2 2 8 4 8

231 6 315 4 105 2 5

x’ x’

+a6 x+

16 16 16 16

The expansion of the j-th Chebyshev polynomial in terms of Legendre polynomials has

the coef¬cients (Tj , Pk )/(Pk , Pk ) where (p, q) denotes the unweighted integral of p(x) q(x)

from -1 to 1. Substituting this into the Chebyshev series and collecting Legendre terms, one

¬nds

(18.42)

a = Mb

where a and b are column vectors containing the Legendre and Chebyshev coef¬cients,

respectively, and where the elements of the square matrix M are

1 1

Mij ≡ (18.43)

dx T2j’2 (x)P2i’2 (x) / dx P2i’2 (x)P2i’2 (x), i, j = 1, 2, . . .

’1 ’1

Because the expansion of a polynomial of degree j requires only the Legendre polynomials

of degree j and lower, the matrix M is upper triangular.

For example, if the expansions are truncated at sixth degree, only P6 in the Legendre

expansion and T6 in the Chebyshev series contain a term proportional to x6 . It follows that

32x6 b6 = (231/16)x6 a6 or in other words M44 = 512/231. Similarly, only two terms in each

series contain terms proportional to x4 . This implies 8b4 ’ 48b6 = (35/8)a4 ’ (315/16)a6 .

Since a6 is already known, this can be rewritten as a4 = (8/35){8b4 ’ (48/11)b6 }. Similarly,

a2 can be computed as a weighted sum of b2 , b4 and b6 and a0 as a weighted sum of four

18.12. LEGENDRE TRANSFORMS AND OTHER SORROWS 407

coef¬cients. The necessary multipliers are the upper 4 x 4 block of the matrix M :

’1 ’ 15 ’ 35

1 1

1

a0 b0

3

’ 16 ’ 21

4 4

a2 0 b2

3 21

(18.44)

=

’ 384

64

a4 b4

0 0 35 385

512

a6 b6

0 0 0 231

Similar matrices give the transform from the Chebyshev (latitudinal Fourier) series for

the antisymmetric m = 0 Legendre polynomials and for the associated Legendre functions

of all higher m. The savings is only about a factor of two; note that the cost of the FFT

in latitude is negligible compared to the cost of the triangular matrix/vector multiply in

the limit N ’ ∞. However, this is an adequate reward for implementing a very simple

algorithm. (Healy, Rockmore, Kostelec and Moore, 1998, amusingly call this the “semi-

naive” algorithm precisely because it is so simple.)

Curiously, this algorithm has been never been used in an operational forecasting or cli-

mate code. One reason is that one must store the elements of a different triangular matrix

for each equatorial parity and each m, which adds up to a lot of storage. A more signi¬-

cant reason is that the Schuster-Dilts algorithm requires grid point values at evenly spaced

points in θ, which is a Chebyshev grid in x = cos(θ). The usual PMMT algorithm uses

points on the Legendre grid. Because a typical forecasting or climate code is O(50, 000)

lines of code, it is not trivial to rewrite a model to accept a different grid, even though

the two grids are very similar and most of the chemistry and hydrologic cycle, etc., are

evaluated point-by-point.

18.12.7 Generalized FFT: Multipoles and All That

As reviewed in Chapter 10, generalizations of the Fast Fourier Transform have been devel-

oped which replace the O(N 2 ) cost of a Legendre transform (for a given zonal wavenum-

ber) by a cost proportional to O(N log2 (N )) operations ” the same as for the Fast Fourier

Transform. Orszag™s (1986) method exploits the three-term recurrence relation satis¬ed by

the Associated Legendre Functions. Boyd(1992c) and Dutt&Rokhlin(1993, 1995) indepen-

dently observed that Fast Multipole Methods could also be applied. Recently, Rockmore et

al.(1998) have developed a very sophisticated algorithm speci¬cally for spherical harmon-

ics.

The bad news is that these algorithms have a huge proportionality constant compared

to the FFT. As a result, none of the generalized FFTs has ever been used in an operational

climate or weather forecasting model. However, as forecasting resolutions climb above

T400, these algorithms may become faster than the PMMT, and perhaps in time replace it.

18.12.8 Summary

The relative slowness of the PMMT (Parity-Exploiting Matrix Multiplication Transform)

for transforming the latitudinal basis functions is the great inef¬ciency-maker for spherical

harmonics algorithms. However, clever tricks, as catalogued above, have kept the spherical

harmonics from obsolesence.

CHAPTER 18. SPHERICAL & CYLINDRICAL GEOMETRY

408

Table 18.5: Legendre Transforms Bibliography

References Comments