. 2
( 83 .)


After reminding, in the scalar case, the connection between the -algorithm, Padà approximants and
continued fractions, these authors show that the vector -algorithm is the best all-purpose algorithm
for the acceleration of vector sequences.
There is a subject which can be related either to interpolation (more precisely, Hermite interpo-
lation by a rational function at the point zero) and to convergence acceleration: it is Padà approx-
imation. Padà approximation is strongly connected to continued fractions, one of the oldest subject
in mathematics since Euclid g.c.d. algorithm is an expansion into a terminating continued fraction.
Preface / Journal of Computational and Applied Mathematics 122 (2000) ix“xi xi

Although they were implicitly known before, Padà approximants were really introduced by Johan
Heinrich Lambert in 1758 and Joseph Louis Lagrange in 1776. Padà approximants have impor-
tant applications in many branches of applied sciences when the solution of a problem is obtained
as a power series expansion and some of its properties have to be guessed from its ÿrst Taylor
coe cients. In this volume, two papers deal with nonclassical applications of Padà approximation.
M. Prà vost shows how Padà approximants can be used to obtain Diophantine approximations
e e
of real and complex numbers and then proving irrationality. Padà approximation of the asymptotic
expansion of the remainder of a series also provides Diophantine approximations.
The solution of a discrete dynamical system can be related to matrix Hermite“Padà approximants,
an approach developed in the paper by V. Sorokin and J. van Iseghem. Spectral properties of the
band operator are investigated. The inverse spectral method is used for the solution of dynamical
systems deÿned by a Lax pair.
Obviously, all aspects of interpolation and extrapolation have not been treated in this volume.
However, many important topics have been covered.
I would like to thank all authors for their e orts.
Journal of Computational and Applied Mathematics 122 (2000) 1“21

Convergence acceleration during the 20th century
C. Brezinski
Laboratoire d™Analyse Numà rique et d™Optimisation, UFR IEEA, Università des Sciences et Technologies de Lille,
e e
59655 “Villeneuve d™Ascq cedex, France

Received 8 March 1999; received in revised form 12 October 1999

1. Introduction

In numerical analysis many methods produce sequences, for instance iterative methods for solving
systems of equations, methods involving series expansions, discretization methods (that is methods
depending on a parameter such that the approximate solution tends to the exact one when the pa-
rameter tends to zero), perturbation methods, etc. Sometimes, the convergence of these sequences
is slow and their e ective use is quite limited. Convergence acceleration methods consist of trans-
forming a slowly converging sequence (Sn ) into a new sequence (Tn ) converging to the same limit
faster than the initial one. Among such sequence transformations, the most well known are certainly
Richardson™s extrapolation algorithm and Aitken™s 2 process. All known methods are constructed
by extrapolation and they are often called extrapolation methods. The idea consists of interpolating
the terms Sn ; Sn+1 ; : : : ; Sn+k of the sequence to be transformed by a sequence satisfying a certain
relationship depending on parameters. This set of sequences is called kernel of the transformation
and every sequence of this set is transformed into a constant sequence by the transformation into
consideration. For example, as we will see below, the kernel of Aitken™s 2 process is the set of
sequences satisfying ∀n; a0 (Sn ’ S) + a1 (Sn+1 ’ S) = 0, where a0 and a1 are parameters such that
a0 + a1 = 0. If Aitken™s process is applied to such a sequence, then the constant sequence (Tn = S)
is obtained. The parameters involved in the deÿnition of the kernel are uniquely determined by the
interpolation conditions and then the limit of the interpolating sequence of the kernel is taken as an
approximation of the limit of the sequence to be accelerated. Since this limit depends on the index
n, it will be denoted by Tn . E ectively, the sequence (Sn ) has been transformed into a new sequence
(Tn ).
This paper, which is based on [31], but includes new developments obtained since 1995, presents
my personal views on the historical development of this subject during the 20th century. I do not
pretend to be exhaustive nor even to quote every important contribution (if a reference does not

E-mail address: claude.brezinski@univ-lille1.fr (C. Brezinski)

0377-0427/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 6 0 - 5
2 C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1“21

appear below, it does not mean that it is less valuable). I refer the interested reader to the literature
and, in particular to the recent books [55,146,33,144]. For an extensive bibliography, see [28].
I will begin with scalar sequences and then treat the case of vector ones. As we will see, a
sequence transformation able to accelerate the convergence of all scalar sequences cannot exist.
Thus, it is necessary to obtain many di erent convergence acceleration methods, each being suitable
for a particular class of sequences. Many authors have studied the properties of these procedures and
proved some important classes of sequences to be accelerable by a given algorithm. Scalar sequence
transformations have also been extensively studied from the theoretical point of view.
The situation is more complicated and more interesting for vector sequences. In the case of a
sequence of vectors, it is always possible to apply a scalar acceleration procedure componentwise.
However, such a strategy does not take into account connections which may exist between the
various components, as in the important case of sequences arising from iterative methods for solving
a system of linear or nonlinear equations.

2. Scalar sequences

Let (Sn ) be a scalar sequence converging to a limit S. As explained above, an extrapolation
method consists of transforming this sequence into a new one, (Tn ), by a sequence transformation
T : (Sn ) ’ (Tn ). The transformation T is said to accelerate the convergence of the sequence (Sn ) if
and only if
Tn ’ S
lim = 0:
n’∞ Sn ’ S

We can then say that (Tn ) converges (to S) faster than (Sn ).
The ÿrst methods to have been used were linear transformations

Tn = ani Si ; n = 0; 1; : : : ;

where the numbers ani are constants independent of the terms of the sequence (Sn ). Such a linear
transformation is usually called a summation process and its properties are completely determined
by the matrix A = (ani ). For practical reasons, only a ÿnite number of the coe cients ani are di erent
from zero for each n. Among such processes are those named after Euler, Cesaro and Holder. In
the case of linear methods, the convergence of the sequence (Tn ) to S for any converging sequence
(Sn ) is governed by the Toeplitz summability theorem; see [115] for a review. Examples of such
processes are
Tn = Si
n+1 i=0
Tn = Si :
k +1 i=n

In the second case, the sequence (Tn ) also depends on a second index, k, and the convergence has to
be studied either when k is ÿxed and n tends to inÿnity, or when n is ÿxed and k tends to inÿnity.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1“21 3

With respect to convergence acceleration, summation processes are usually only able to accelerate
the convergence of restricted classes of sequences and this is why the numerical analysts of the
20th century turned their e orts to nonlinear transformations. However, there is one exception:
Richardson™s extrapolation process.

2.1. Richardson™s process

It seems that the ÿrst appearance of a particular case of what is now called the Richardson ex-
trapolation process is due to Christian Huygens (1629 “1695). In 1903, Robert Moir Milne (1873)
applied the idea of Huygens for computing [101]. The same idea was exploited again by Karl
Kommerell (1871“1948) in his book of 1936 [78]. As explained in [143], Kommerell can be con-
sidered as the real discoverer of Romberg™s method although he used this scheme in the context of
approximating .
Let us now come to the procedures used for improving the accuracy of the trapezoidal rule for
computing approximations to a deÿnite integral. In the case of a su ciently smooth function, the
error of this method is given by the Euler“Maclaurin expansion. In 1742, Colin Maclaurin (1698“
1746) [90] showed that its precision could be improved by forming linear combinations of the
results obtained with various stepsizes. His procedure can be interpreted as a preliminary version of
Romberg™s method; see [49] for a discussion.
In 1900, William Fleetwood Sheppard (1863“1936) used an elimination strategy in the Euler“
Maclaurin quadrature formula with hn = rn h and 1 = r0 ¡ r1 ¡ r2 ¡ · · · to produce a better approxi-
mation to the given integral [132].
In 1910, combining the results obtained with the stepsizes h and 2h, Lewis Fry Richardson (1881“
1953) eliminated the ÿrst term in a discretization process using central di erences [119]. He called
this procedure the deferred approach to the limit or h2 -extrapolation. The transformed sequence
(Tn ) is given by
h2 S(hn ) ’ h2 S(hn+1 )
Tn = n+1 2 n
hn+1 ’ h2 n

In a 1927 paper [120] he used the same technique to solve a 6th order di erential eigenvalue problem.
His process was called (h2 ; h4 )-extrapolation. Richardson extrapolation consists of computing the
value at 0, denoted by Tk(n) , of the interpolation polynomial of the degree at most k, which passes
through the points (x n ; Sn ); : : : ; (x n+k ; Sn+k ). Using the Neville“Aitken scheme for these interpolation
polynomials, we immediately obtain
x n+k+1 Tk(n) ’ x n Tk(n+1)
Tk+1 =
x n+k+1 ’ x n
with T0(n) = Sn .
Let us mention that Richardson referred to a 1926 paper by Nikolai Nikolaevich Bogolyubov
(born in 1909) and Nikolai Mitrofanovich Krylov (1879 “1955) where the procedure (often called
the deferred approach to the limit) can already be found [11].
In 1955, Werner Romberg (born in 1909) was the ÿrst to use repeatedly an elimination approach
for improving the accuracy of the trapezoidal rule [121]. He himself refers to the book of Lothar
Collatz (1910 “1990) of 1951 [50]. The procedure became widely known after the rigorous error
4 C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1“21

analysis given in 1961 by Friedrich L. Bauer [3] and the work of Eduard L. Stiefel (1909 “1978)
[138]. Romberg™s derivation of his process was heuristic. It was proved by Pierre-Jean Laurent in
1963 [81] that the process comes out from the Richardson process by choosing x n =h2 and hn =h0 =2n .
Laurent also gave conditions on the choice of the sequence (x n ) in order that the sequences (Tk(n) )
tend to S either when k or n tends to inÿnity. Weaker conditions were given by Michel Crouzeix
and Alain L. Mignot in [52, pp. 52“55]. As we shall see below, extensions of Romberg™s method
to nonsmooth integrands leads to a method called the E-algorithm.
Applications of extrapolation to the numerical solution of ordinary di erential equations were stud-
ied by H.C. Bolton and H.I. Scoins in 1956 [12], Roland Bulirsch and Josef Stoer in 1964 “1966 [47]
and William B. Gragg [65] in 1965. The case of di erence methods for partial di erential equations
was treated by Guri Ivanovich Marchuk and V.V. Shaidurov [91]. Sturm“Liouville problems are
discussed in [117]. Finally, we mention that Heinz Rutishauser (1918“1970) pointed out in 1963
[122] that Romberg™s idea can be applied to any sequence as long as the error has an asymptotic
expansion of a form similar to the Euler“Maclaurin™s.
For a detailed history of the Richardson method, its developments and applications, see [57,77,143].

2.2. Aitken™s process and the -algorithm

The most popular nonlinear acceleration method is certainly Aitken™s process which is given
(Sn+1 ’ Sn )2
Sn Sn+2 ’ Sn+1
Tn = = Sn ’ ; n = 0; 1; : : :
Sn+2 ’ 2Sn+1 + Sn Sn+2 ’ 2Sn+1 + Sn
The method was stated by Alexander Craig Aitken (1895 “1967) in 1926 [1], who used it to ac-
celerate the convergence of Bernoulli™s method for computing the dominant zero of a polynomial.
Aitken pointed out that the same method was obtained by Hans von Naegelsbach (1838) in 1876
in his study of Furstenau™s method for solving nonlinear equations [104]. The process was also
given by James Clerk Maxwell (1831“1879) in his Treatise on Electricity and Magnetism of 1873
[95]. However, neither Naegelsbach nor Maxwell used it for the purpose of acceleration. Maxwell
wanted to ÿnd the equilibrium position of a pointer oscillating with an exponentially damped simple
harmonic motion from three experimental measurements. It is surprising that Aitken™s process was
known to Takakazu Seki (1642“1708), often considered the greatest Japanese mathematician. In his
book Katsuyo Sanpo, Vol. IV, he used this process to compute the value of , the length of a chord
and the volume of a sphere. This book was written around 1680 but only published in 1712 by his
disciple Murahide Araki. Parts of it can be found in [73]. Let us mention that the Japanese characters
corresponding to Takakazu have another pronounciation which is Kowa. This is the reason why this
mathematician is often called, erroneously as in [29,31] Seki Kowa.
What makes Aitken™s process so popular is that it accelerates the convergence of all linearly
converging sequences, that is sequences such that ∃a = 1
Sn+1 ’ S
lim = a:
n’∞ Sn ’ S

It can even accelerate some logarithmic sequences (that is corresponding to a = 1) which are those
with the slowest convergence and the most di cult to accelerate.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1“21 5

Aitken™s 2 process is exact (which means that ∀n; Tn = S) for sequences satisfying, a0 (Sn ’ S) +
a1 (Sn+1 ’ S) = 0; ∀n; a0 a1 = 0; a0 + a1 = 0. Such sequences form the kernel of Aitken™s process.
The idea naturally arose of ÿnding a transformation with the kernel
a0 (Sn ’ S) + · · · + ak (Sn+k ’ S) = 0; ∀n;
a0 ak = 0; a0 + · · · + ak = 0. A particular case of k = 2 was already treated by Maxwell in his
book of 1873 and a particular case of an arbitrary value of k was studied by T.H. O™Beirne in 1947
[107]. This last work remains almost unknown since it was published only as an internal report. The
problem was handled in full generality by Daniel Shanks (1917“1996) in 1949 [130] and again in
1955 [131]. He obtained the sequence transformation deÿned by
Sn Sn+1 · · · Sn+k
Sn+1 Sn+2 · · · Sn+k+1
. . .
. . .
. . .
Sn+k Sn+k+1 · · · Sn+2k
Tn = ek (Sn ) = :
Sn · · · 2 Sn+k’1
. .
. .
. .
2 2
Sn+k’1 · · · Sn+2k’2
When k = 1, Shanks transformation reduces to the Aitken™s 2 process. It can be proved that
ek (Sn ) = S; ∀n if and only if (Sn ) belongs to the kernel of the transformation given above. The same
ratios of determinants were obtained by R.J. Schmidt in 1941 [127] in his study of a method for
solving systems of linear equations.
The determinants involved in the deÿnition of ek (Sn ) have a very special structure. They are
called Hankel determinants and were studied by Hermann Hankel (1839 “1873) in his thesis in
1861 [72]. Such determinants satisfy a ÿve-term recurrence relationship. This relation was used
by O™Beirne and Shanks to implement the transformation by computing separately the numerators
and the denominators of the ek (Sn )™s. However, numerical analysts know it is di cult to compute


. 2
( 83 .)