ñòð. 2 |

e

continued fractions, these authors show that the vector -algorithm is the best all-purpose algorithm

for the acceleration of vector sequences.

There is a subject which can be related either to interpolation (more precisely, Hermite interpo-

lation by a rational function at the point zero) and to convergence acceleration: it is PadÃƒ approx-

e

imation. PadÃƒ approximation is strongly connected to continued fractions, one of the oldest subject

e

in mathematics since Euclid g.c.d. algorithm is an expansion into a terminating continued fraction.

Preface / Journal of Computational and Applied Mathematics 122 (2000) ixâ€“xi xi

Although they were implicitly known before, PadÃƒ approximants were really introduced by Johan

e

Heinrich Lambert in 1758 and Joseph Louis Lagrange in 1776. PadÃƒ approximants have impor-

e

tant applications in many branches of applied sciences when the solution of a problem is obtained

as a power series expansion and some of its properties have to be guessed from its Ã¿rst Taylor

coe cients. In this volume, two papers deal with nonclassical applications of PadÃƒ approximation.

e

M. PrÃƒ vost shows how PadÃƒ approximants can be used to obtain Diophantine approximations

e e

of real and complex numbers and then proving irrationality. PadÃƒ approximation of the asymptotic

e

expansion of the remainder of a series also provides Diophantine approximations.

The solution of a discrete dynamical system can be related to matrix Hermiteâ€“PadÃƒ approximants,

e

an approach developed in the paper by V. Sorokin and J. van Iseghem. Spectral properties of the

band operator are investigated. The inverse spectral method is used for the solution of dynamical

systems deÃ¿ned by a Lax pair.

Obviously, all aspects of interpolation and extrapolation have not been treated in this volume.

However, many important topics have been covered.

I would like to thank all authors for their e orts.

Journal of Computational and Applied Mathematics 122 (2000) 1â€“21

www.elsevier.nl/locate/cam

Convergence acceleration during the 20th century

C. Brezinski

Laboratoire dâ€™Analyse NumÃƒ rique et dâ€™Optimisation, UFR IEEA, UniversitÃƒ des Sciences et Technologies de Lille,

e e

59655 â€“Villeneuve dâ€™Ascq cedex, France

Received 8 March 1999; received in revised form 12 October 1999

1. Introduction

In numerical analysis many methods produce sequences, for instance iterative methods for solving

systems of equations, methods involving series expansions, discretization methods (that is methods

depending on a parameter such that the approximate solution tends to the exact one when the pa-

rameter tends to zero), perturbation methods, etc. Sometimes, the convergence of these sequences

is slow and their e ective use is quite limited. Convergence acceleration methods consist of trans-

forming a slowly converging sequence (Sn ) into a new sequence (Tn ) converging to the same limit

faster than the initial one. Among such sequence transformations, the most well known are certainly

Richardsonâ€™s extrapolation algorithm and Aitkenâ€™s 2 process. All known methods are constructed

by extrapolation and they are often called extrapolation methods. The idea consists of interpolating

the terms Sn ; Sn+1 ; : : : ; Sn+k of the sequence to be transformed by a sequence satisfying a certain

relationship depending on parameters. This set of sequences is called kernel of the transformation

and every sequence of this set is transformed into a constant sequence by the transformation into

consideration. For example, as we will see below, the kernel of Aitkenâ€™s 2 process is the set of

sequences satisfying âˆ€n; a0 (Sn âˆ’ S) + a1 (Sn+1 âˆ’ S) = 0, where a0 and a1 are parameters such that

a0 + a1 = 0. If Aitkenâ€™s process is applied to such a sequence, then the constant sequence (Tn = S)

is obtained. The parameters involved in the deÃ¿nition of the kernel are uniquely determined by the

interpolation conditions and then the limit of the interpolating sequence of the kernel is taken as an

approximation of the limit of the sequence to be accelerated. Since this limit depends on the index

n, it will be denoted by Tn . E ectively, the sequence (Sn ) has been transformed into a new sequence

(Tn ).

This paper, which is based on [31], but includes new developments obtained since 1995, presents

my personal views on the historical development of this subject during the 20th century. I do not

pretend to be exhaustive nor even to quote every important contribution (if a reference does not

E-mail address: claude.brezinski@univ-lille1.fr (C. Brezinski)

0377-0427/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved.

PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 6 0 - 5

2 C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21

appear below, it does not mean that it is less valuable). I refer the interested reader to the literature

and, in particular to the recent books [55,146,33,144]. For an extensive bibliography, see [28].

I will begin with scalar sequences and then treat the case of vector ones. As we will see, a

sequence transformation able to accelerate the convergence of all scalar sequences cannot exist.

Thus, it is necessary to obtain many di erent convergence acceleration methods, each being suitable

for a particular class of sequences. Many authors have studied the properties of these procedures and

proved some important classes of sequences to be accelerable by a given algorithm. Scalar sequence

transformations have also been extensively studied from the theoretical point of view.

The situation is more complicated and more interesting for vector sequences. In the case of a

sequence of vectors, it is always possible to apply a scalar acceleration procedure componentwise.

However, such a strategy does not take into account connections which may exist between the

various components, as in the important case of sequences arising from iterative methods for solving

a system of linear or nonlinear equations.

2. Scalar sequences

Let (Sn ) be a scalar sequence converging to a limit S. As explained above, an extrapolation

method consists of transforming this sequence into a new one, (Tn ), by a sequence transformation

T : (Sn ) â†’ (Tn ). The transformation T is said to accelerate the convergence of the sequence (Sn ) if

and only if

Tn âˆ’ S

lim = 0:

nâ†’âˆž Sn âˆ’ S

We can then say that (Tn ) converges (to S) faster than (Sn ).

The Ã¿rst methods to have been used were linear transformations

âˆž

Tn = ani Si ; n = 0; 1; : : : ;

i=0

where the numbers ani are constants independent of the terms of the sequence (Sn ). Such a linear

transformation is usually called a summation process and its properties are completely determined

by the matrix A = (ani ). For practical reasons, only a Ã¿nite number of the coe cients ani are di erent

from zero for each n. Among such processes are those named after Euler, Cesaro and Holder. In

the case of linear methods, the convergence of the sequence (Tn ) to S for any converging sequence

(Sn ) is governed by the Toeplitz summability theorem; see [115] for a review. Examples of such

processes are

n

1

Tn = Si

n+1 i=0

or

n+k

1

Tn = Si :

k +1 i=n

In the second case, the sequence (Tn ) also depends on a second index, k, and the convergence has to

be studied either when k is Ã¿xed and n tends to inÃ¿nity, or when n is Ã¿xed and k tends to inÃ¿nity.

C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21 3

With respect to convergence acceleration, summation processes are usually only able to accelerate

the convergence of restricted classes of sequences and this is why the numerical analysts of the

20th century turned their e orts to nonlinear transformations. However, there is one exception:

Richardsonâ€™s extrapolation process.

2.1. Richardsonâ€™s process

It seems that the Ã¿rst appearance of a particular case of what is now called the Richardson ex-

trapolation process is due to Christian Huygens (1629 â€“1695). In 1903, Robert Moir Milne (1873)

applied the idea of Huygens for computing [101]. The same idea was exploited again by Karl

Kommerell (1871â€“1948) in his book of 1936 [78]. As explained in [143], Kommerell can be con-

sidered as the real discoverer of Rombergâ€™s method although he used this scheme in the context of

approximating .

Let us now come to the procedures used for improving the accuracy of the trapezoidal rule for

computing approximations to a deÃ¿nite integral. In the case of a su ciently smooth function, the

error of this method is given by the Eulerâ€“Maclaurin expansion. In 1742, Colin Maclaurin (1698â€“

1746) [90] showed that its precision could be improved by forming linear combinations of the

results obtained with various stepsizes. His procedure can be interpreted as a preliminary version of

Rombergâ€™s method; see [49] for a discussion.

In 1900, William Fleetwood Sheppard (1863â€“1936) used an elimination strategy in the Eulerâ€“

Maclaurin quadrature formula with hn = rn h and 1 = r0 Â¡ r1 Â¡ r2 Â¡ Â· Â· Â· to produce a better approxi-

mation to the given integral [132].

In 1910, combining the results obtained with the stepsizes h and 2h, Lewis Fry Richardson (1881â€“

1953) eliminated the Ã¿rst term in a discretization process using central di erences [119]. He called

this procedure the deferred approach to the limit or h2 -extrapolation. The transformed sequence

(Tn ) is given by

h2 S(hn ) âˆ’ h2 S(hn+1 )

Tn = n+1 2 n

:

hn+1 âˆ’ h2 n

In a 1927 paper [120] he used the same technique to solve a 6th order di erential eigenvalue problem.

His process was called (h2 ; h4 )-extrapolation. Richardson extrapolation consists of computing the

value at 0, denoted by Tk(n) , of the interpolation polynomial of the degree at most k, which passes

through the points (x n ; Sn ); : : : ; (x n+k ; Sn+k ). Using the Nevilleâ€“Aitken scheme for these interpolation

polynomials, we immediately obtain

x n+k+1 Tk(n) âˆ’ x n Tk(n+1)

(n)

Tk+1 =

x n+k+1 âˆ’ x n

with T0(n) = Sn .

Let us mention that Richardson referred to a 1926 paper by Nikolai Nikolaevich Bogolyubov

(born in 1909) and Nikolai Mitrofanovich Krylov (1879 â€“1955) where the procedure (often called

the deferred approach to the limit) can already be found [11].

In 1955, Werner Romberg (born in 1909) was the Ã¿rst to use repeatedly an elimination approach

for improving the accuracy of the trapezoidal rule [121]. He himself refers to the book of Lothar

Collatz (1910 â€“1990) of 1951 [50]. The procedure became widely known after the rigorous error

4 C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21

analysis given in 1961 by Friedrich L. Bauer [3] and the work of Eduard L. Stiefel (1909 â€“1978)

[138]. Rombergâ€™s derivation of his process was heuristic. It was proved by Pierre-Jean Laurent in

1963 [81] that the process comes out from the Richardson process by choosing x n =h2 and hn =h0 =2n .

n

Laurent also gave conditions on the choice of the sequence (x n ) in order that the sequences (Tk(n) )

tend to S either when k or n tends to inÃ¿nity. Weaker conditions were given by Michel Crouzeix

and Alain L. Mignot in [52, pp. 52â€“55]. As we shall see below, extensions of Rombergâ€™s method

to nonsmooth integrands leads to a method called the E-algorithm.

Applications of extrapolation to the numerical solution of ordinary di erential equations were stud-

ied by H.C. Bolton and H.I. Scoins in 1956 [12], Roland Bulirsch and Josef Stoer in 1964 â€“1966 [47]

and William B. Gragg [65] in 1965. The case of di erence methods for partial di erential equations

was treated by Guri Ivanovich Marchuk and V.V. Shaidurov [91]. Sturmâ€“Liouville problems are

discussed in [117]. Finally, we mention that Heinz Rutishauser (1918â€“1970) pointed out in 1963

[122] that Rombergâ€™s idea can be applied to any sequence as long as the error has an asymptotic

expansion of a form similar to the Eulerâ€“Maclaurinâ€™s.

For a detailed history of the Richardson method, its developments and applications, see [57,77,143].

2.2. Aitkenâ€™s process and the -algorithm

2

The most popular nonlinear acceleration method is certainly Aitkenâ€™s process which is given

by

2

(Sn+1 âˆ’ Sn )2

Sn Sn+2 âˆ’ Sn+1

Tn = = Sn âˆ’ ; n = 0; 1; : : :

Sn+2 âˆ’ 2Sn+1 + Sn Sn+2 âˆ’ 2Sn+1 + Sn

The method was stated by Alexander Craig Aitken (1895 â€“1967) in 1926 [1], who used it to ac-

celerate the convergence of Bernoulliâ€™s method for computing the dominant zero of a polynomial.

Aitken pointed out that the same method was obtained by Hans von Naegelsbach (1838) in 1876

in his study of Furstenauâ€™s method for solving nonlinear equations [104]. The process was also

given by James Clerk Maxwell (1831â€“1879) in his Treatise on Electricity and Magnetism of 1873

[95]. However, neither Naegelsbach nor Maxwell used it for the purpose of acceleration. Maxwell

wanted to Ã¿nd the equilibrium position of a pointer oscillating with an exponentially damped simple

harmonic motion from three experimental measurements. It is surprising that Aitkenâ€™s process was

known to Takakazu Seki (1642â€“1708), often considered the greatest Japanese mathematician. In his

book Katsuyo Sanpo, Vol. IV, he used this process to compute the value of , the length of a chord

and the volume of a sphere. This book was written around 1680 but only published in 1712 by his

disciple Murahide Araki. Parts of it can be found in [73]. Let us mention that the Japanese characters

corresponding to Takakazu have another pronounciation which is Kowa. This is the reason why this

mathematician is often called, erroneously as in [29,31] Seki Kowa.

What makes Aitkenâ€™s process so popular is that it accelerates the convergence of all linearly

converging sequences, that is sequences such that âˆƒa = 1

Sn+1 âˆ’ S

lim = a:

nâ†’âˆž Sn âˆ’ S

It can even accelerate some logarithmic sequences (that is corresponding to a = 1) which are those

with the slowest convergence and the most di cult to accelerate.

C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21 5

Aitkenâ€™s 2 process is exact (which means that âˆ€n; Tn = S) for sequences satisfying, a0 (Sn âˆ’ S) +

a1 (Sn+1 âˆ’ S) = 0; âˆ€n; a0 a1 = 0; a0 + a1 = 0. Such sequences form the kernel of Aitkenâ€™s process.

The idea naturally arose of Ã¿nding a transformation with the kernel

a0 (Sn âˆ’ S) + Â· Â· Â· + ak (Sn+k âˆ’ S) = 0; âˆ€n;

a0 ak = 0; a0 + Â· Â· Â· + ak = 0. A particular case of k = 2 was already treated by Maxwell in his

book of 1873 and a particular case of an arbitrary value of k was studied by T.H. Oâ€™Beirne in 1947

[107]. This last work remains almost unknown since it was published only as an internal report. The

problem was handled in full generality by Daniel Shanks (1917â€“1996) in 1949 [130] and again in

1955 [131]. He obtained the sequence transformation deÃ¿ned by

Sn Sn+1 Â· Â· Â· Sn+k

Sn+1 Sn+2 Â· Â· Â· Sn+k+1

. . .

. . .

. . .

Sn+k Sn+k+1 Â· Â· Â· Sn+2k

Tn = ek (Sn ) = :

2

Sn Â· Â· Â· 2 Sn+kâˆ’1

. .

. .

. .

2 2

Sn+kâˆ’1 Â· Â· Â· Sn+2kâˆ’2

When k = 1, Shanks transformation reduces to the Aitkenâ€™s 2 process. It can be proved that

ek (Sn ) = S; âˆ€n if and only if (Sn ) belongs to the kernel of the transformation given above. The same

ratios of determinants were obtained by R.J. Schmidt in 1941 [127] in his study of a method for

solving systems of linear equations.

The determinants involved in the deÃ¿nition of ek (Sn ) have a very special structure. They are

called Hankel determinants and were studied by Hermann Hankel (1839 â€“1873) in his thesis in

1861 [72]. Such determinants satisfy a Ã¿ve-term recurrence relationship. This relation was used

by Oâ€™Beirne and Shanks to implement the transformation by computing separately the numerators

and the denominators of the ek (Sn )â€™s. However, numerical analysts know it is di cult to compute

ñòð. 2 |