continued fractions, these authors show that the vector -algorithm is the best all-purpose algorithm
for the acceleration of vector sequences.
There is a subject which can be related either to interpolation (more precisely, Hermite interpo-
lation by a rational function at the point zero) and to convergence acceleration: it is PadÃƒ approx-
imation. PadÃƒ approximation is strongly connected to continued fractions, one of the oldest subject
in mathematics since Euclid g.c.d. algorithm is an expansion into a terminating continued fraction.
Preface / Journal of Computational and Applied Mathematics 122 (2000) ixâ€“xi xi
Although they were implicitly known before, PadÃƒ approximants were really introduced by Johan
Heinrich Lambert in 1758 and Joseph Louis Lagrange in 1776. PadÃƒ approximants have impor-
tant applications in many branches of applied sciences when the solution of a problem is obtained
as a power series expansion and some of its properties have to be guessed from its Ã¿rst Taylor
coe cients. In this volume, two papers deal with nonclassical applications of PadÃƒ approximation.
M. PrÃƒ vost shows how PadÃƒ approximants can be used to obtain Diophantine approximations
of real and complex numbers and then proving irrationality. PadÃƒ approximation of the asymptotic
expansion of the remainder of a series also provides Diophantine approximations.
The solution of a discrete dynamical system can be related to matrix Hermiteâ€“PadÃƒ approximants,
an approach developed in the paper by V. Sorokin and J. van Iseghem. Spectral properties of the
band operator are investigated. The inverse spectral method is used for the solution of dynamical
systems deÃ¿ned by a Lax pair.
Obviously, all aspects of interpolation and extrapolation have not been treated in this volume.
However, many important topics have been covered.
I would like to thank all authors for their e orts.
Journal of Computational and Applied Mathematics 122 (2000) 1â€“21
Convergence acceleration during the 20th century
Laboratoire dâ€™Analyse NumÃƒ rique et dâ€™Optimisation, UFR IEEA, UniversitÃƒ des Sciences et Technologies de Lille,
59655 â€“Villeneuve dâ€™Ascq cedex, France
Received 8 March 1999; received in revised form 12 October 1999
In numerical analysis many methods produce sequences, for instance iterative methods for solving
systems of equations, methods involving series expansions, discretization methods (that is methods
depending on a parameter such that the approximate solution tends to the exact one when the pa-
rameter tends to zero), perturbation methods, etc. Sometimes, the convergence of these sequences
is slow and their e ective use is quite limited. Convergence acceleration methods consist of trans-
forming a slowly converging sequence (Sn ) into a new sequence (Tn ) converging to the same limit
faster than the initial one. Among such sequence transformations, the most well known are certainly
Richardsonâ€™s extrapolation algorithm and Aitkenâ€™s 2 process. All known methods are constructed
by extrapolation and they are often called extrapolation methods. The idea consists of interpolating
the terms Sn ; Sn+1 ; : : : ; Sn+k of the sequence to be transformed by a sequence satisfying a certain
relationship depending on parameters. This set of sequences is called kernel of the transformation
and every sequence of this set is transformed into a constant sequence by the transformation into
consideration. For example, as we will see below, the kernel of Aitkenâ€™s 2 process is the set of
sequences satisfying âˆ€n; a0 (Sn âˆ’ S) + a1 (Sn+1 âˆ’ S) = 0, where a0 and a1 are parameters such that
a0 + a1 = 0. If Aitkenâ€™s process is applied to such a sequence, then the constant sequence (Tn = S)
is obtained. The parameters involved in the deÃ¿nition of the kernel are uniquely determined by the
interpolation conditions and then the limit of the interpolating sequence of the kernel is taken as an
approximation of the limit of the sequence to be accelerated. Since this limit depends on the index
n, it will be denoted by Tn . E ectively, the sequence (Sn ) has been transformed into a new sequence
This paper, which is based on , but includes new developments obtained since 1995, presents
my personal views on the historical development of this subject during the 20th century. I do not
pretend to be exhaustive nor even to quote every important contribution (if a reference does not
E-mail address: firstname.lastname@example.org (C. Brezinski)
0377-0427/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 0 4 2 7 ( 0 0 ) 0 0 3 6 0 - 5
2 C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21
appear below, it does not mean that it is less valuable). I refer the interested reader to the literature
and, in particular to the recent books [55,146,33,144]. For an extensive bibliography, see .
I will begin with scalar sequences and then treat the case of vector ones. As we will see, a
sequence transformation able to accelerate the convergence of all scalar sequences cannot exist.
Thus, it is necessary to obtain many di erent convergence acceleration methods, each being suitable
for a particular class of sequences. Many authors have studied the properties of these procedures and
proved some important classes of sequences to be accelerable by a given algorithm. Scalar sequence
transformations have also been extensively studied from the theoretical point of view.
The situation is more complicated and more interesting for vector sequences. In the case of a
sequence of vectors, it is always possible to apply a scalar acceleration procedure componentwise.
However, such a strategy does not take into account connections which may exist between the
various components, as in the important case of sequences arising from iterative methods for solving
a system of linear or nonlinear equations.
2. Scalar sequences
Let (Sn ) be a scalar sequence converging to a limit S. As explained above, an extrapolation
method consists of transforming this sequence into a new one, (Tn ), by a sequence transformation
T : (Sn ) â†’ (Tn ). The transformation T is said to accelerate the convergence of the sequence (Sn ) if
and only if
Tn âˆ’ S
lim = 0:
nâ†’âˆž Sn âˆ’ S
We can then say that (Tn ) converges (to S) faster than (Sn ).
The Ã¿rst methods to have been used were linear transformations
Tn = ani Si ; n = 0; 1; : : : ;
where the numbers ani are constants independent of the terms of the sequence (Sn ). Such a linear
transformation is usually called a summation process and its properties are completely determined
by the matrix A = (ani ). For practical reasons, only a Ã¿nite number of the coe cients ani are di erent
from zero for each n. Among such processes are those named after Euler, Cesaro and Holder. In
the case of linear methods, the convergence of the sequence (Tn ) to S for any converging sequence
(Sn ) is governed by the Toeplitz summability theorem; see  for a review. Examples of such
Tn = Si
Tn = Si :
k +1 i=n
In the second case, the sequence (Tn ) also depends on a second index, k, and the convergence has to
be studied either when k is Ã¿xed and n tends to inÃ¿nity, or when n is Ã¿xed and k tends to inÃ¿nity.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21 3
With respect to convergence acceleration, summation processes are usually only able to accelerate
the convergence of restricted classes of sequences and this is why the numerical analysts of the
20th century turned their e orts to nonlinear transformations. However, there is one exception:
Richardsonâ€™s extrapolation process.
2.1. Richardsonâ€™s process
It seems that the Ã¿rst appearance of a particular case of what is now called the Richardson ex-
trapolation process is due to Christian Huygens (1629 â€“1695). In 1903, Robert Moir Milne (1873)
applied the idea of Huygens for computing . The same idea was exploited again by Karl
Kommerell (1871â€“1948) in his book of 1936 . As explained in , Kommerell can be con-
sidered as the real discoverer of Rombergâ€™s method although he used this scheme in the context of
Let us now come to the procedures used for improving the accuracy of the trapezoidal rule for
computing approximations to a deÃ¿nite integral. In the case of a su ciently smooth function, the
error of this method is given by the Eulerâ€“Maclaurin expansion. In 1742, Colin Maclaurin (1698â€“
1746)  showed that its precision could be improved by forming linear combinations of the
results obtained with various stepsizes. His procedure can be interpreted as a preliminary version of
Rombergâ€™s method; see  for a discussion.
In 1900, William Fleetwood Sheppard (1863â€“1936) used an elimination strategy in the Eulerâ€“
Maclaurin quadrature formula with hn = rn h and 1 = r0 Â¡ r1 Â¡ r2 Â¡ Â· Â· Â· to produce a better approxi-
mation to the given integral .
In 1910, combining the results obtained with the stepsizes h and 2h, Lewis Fry Richardson (1881â€“
1953) eliminated the Ã¿rst term in a discretization process using central di erences . He called
this procedure the deferred approach to the limit or h2 -extrapolation. The transformed sequence
(Tn ) is given by
h2 S(hn ) âˆ’ h2 S(hn+1 )
Tn = n+1 2 n
hn+1 âˆ’ h2 n
In a 1927 paper  he used the same technique to solve a 6th order di erential eigenvalue problem.
His process was called (h2 ; h4 )-extrapolation. Richardson extrapolation consists of computing the
value at 0, denoted by Tk(n) , of the interpolation polynomial of the degree at most k, which passes
through the points (x n ; Sn ); : : : ; (x n+k ; Sn+k ). Using the Nevilleâ€“Aitken scheme for these interpolation
polynomials, we immediately obtain
x n+k+1 Tk(n) âˆ’ x n Tk(n+1)
x n+k+1 âˆ’ x n
with T0(n) = Sn .
Let us mention that Richardson referred to a 1926 paper by Nikolai Nikolaevich Bogolyubov
(born in 1909) and Nikolai Mitrofanovich Krylov (1879 â€“1955) where the procedure (often called
the deferred approach to the limit) can already be found .
In 1955, Werner Romberg (born in 1909) was the Ã¿rst to use repeatedly an elimination approach
for improving the accuracy of the trapezoidal rule . He himself refers to the book of Lothar
Collatz (1910 â€“1990) of 1951 . The procedure became widely known after the rigorous error
4 C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21
analysis given in 1961 by Friedrich L. Bauer  and the work of Eduard L. Stiefel (1909 â€“1978)
. Rombergâ€™s derivation of his process was heuristic. It was proved by Pierre-Jean Laurent in
1963  that the process comes out from the Richardson process by choosing x n =h2 and hn =h0 =2n .
Laurent also gave conditions on the choice of the sequence (x n ) in order that the sequences (Tk(n) )
tend to S either when k or n tends to inÃ¿nity. Weaker conditions were given by Michel Crouzeix
and Alain L. Mignot in [52, pp. 52â€“55]. As we shall see below, extensions of Rombergâ€™s method
to nonsmooth integrands leads to a method called the E-algorithm.
Applications of extrapolation to the numerical solution of ordinary di erential equations were stud-
ied by H.C. Bolton and H.I. Scoins in 1956 , Roland Bulirsch and Josef Stoer in 1964 â€“1966 
and William B. Gragg  in 1965. The case of di erence methods for partial di erential equations
was treated by Guri Ivanovich Marchuk and V.V. Shaidurov . Sturmâ€“Liouville problems are
discussed in . Finally, we mention that Heinz Rutishauser (1918â€“1970) pointed out in 1963
 that Rombergâ€™s idea can be applied to any sequence as long as the error has an asymptotic
expansion of a form similar to the Eulerâ€“Maclaurinâ€™s.
For a detailed history of the Richardson method, its developments and applications, see [57,77,143].
2.2. Aitkenâ€™s process and the -algorithm
The most popular nonlinear acceleration method is certainly Aitkenâ€™s process which is given
(Sn+1 âˆ’ Sn )2
Sn Sn+2 âˆ’ Sn+1
Tn = = Sn âˆ’ ; n = 0; 1; : : :
Sn+2 âˆ’ 2Sn+1 + Sn Sn+2 âˆ’ 2Sn+1 + Sn
The method was stated by Alexander Craig Aitken (1895 â€“1967) in 1926 , who used it to ac-
celerate the convergence of Bernoulliâ€™s method for computing the dominant zero of a polynomial.
Aitken pointed out that the same method was obtained by Hans von Naegelsbach (1838) in 1876
in his study of Furstenauâ€™s method for solving nonlinear equations . The process was also
given by James Clerk Maxwell (1831â€“1879) in his Treatise on Electricity and Magnetism of 1873
. However, neither Naegelsbach nor Maxwell used it for the purpose of acceleration. Maxwell
wanted to Ã¿nd the equilibrium position of a pointer oscillating with an exponentially damped simple
harmonic motion from three experimental measurements. It is surprising that Aitkenâ€™s process was
known to Takakazu Seki (1642â€“1708), often considered the greatest Japanese mathematician. In his
book Katsuyo Sanpo, Vol. IV, he used this process to compute the value of , the length of a chord
and the volume of a sphere. This book was written around 1680 but only published in 1712 by his
disciple Murahide Araki. Parts of it can be found in . Let us mention that the Japanese characters
corresponding to Takakazu have another pronounciation which is Kowa. This is the reason why this
mathematician is often called, erroneously as in [29,31] Seki Kowa.
What makes Aitkenâ€™s process so popular is that it accelerates the convergence of all linearly
converging sequences, that is sequences such that âˆƒa = 1
Sn+1 âˆ’ S
lim = a:
nâ†’âˆž Sn âˆ’ S
It can even accelerate some logarithmic sequences (that is corresponding to a = 1) which are those
with the slowest convergence and the most di cult to accelerate.
C. Brezinski / Journal of Computational and Applied Mathematics 122 (2000) 1â€“21 5
Aitkenâ€™s 2 process is exact (which means that âˆ€n; Tn = S) for sequences satisfying, a0 (Sn âˆ’ S) +
a1 (Sn+1 âˆ’ S) = 0; âˆ€n; a0 a1 = 0; a0 + a1 = 0. Such sequences form the kernel of Aitkenâ€™s process.
The idea naturally arose of Ã¿nding a transformation with the kernel
a0 (Sn âˆ’ S) + Â· Â· Â· + ak (Sn+k âˆ’ S) = 0; âˆ€n;
a0 ak = 0; a0 + Â· Â· Â· + ak = 0. A particular case of k = 2 was already treated by Maxwell in his
book of 1873 and a particular case of an arbitrary value of k was studied by T.H. Oâ€™Beirne in 1947
. This last work remains almost unknown since it was published only as an internal report. The
problem was handled in full generality by Daniel Shanks (1917â€“1996) in 1949  and again in
1955 . He obtained the sequence transformation deÃ¿ned by
Sn Sn+1 Â· Â· Â· Sn+k
Sn+1 Sn+2 Â· Â· Â· Sn+k+1
. . .
. . .
. . .
Sn+k Sn+k+1 Â· Â· Â· Sn+2k
Tn = ek (Sn ) = :
Sn Â· Â· Â· 2 Sn+kâˆ’1
Sn+kâˆ’1 Â· Â· Â· Sn+2kâˆ’2
When k = 1, Shanks transformation reduces to the Aitkenâ€™s 2 process. It can be proved that
ek (Sn ) = S; âˆ€n if and only if (Sn ) belongs to the kernel of the transformation given above. The same
ratios of determinants were obtained by R.J. Schmidt in 1941  in his study of a method for
solving systems of linear equations.
The determinants involved in the deÃ¿nition of ek (Sn ) have a very special structure. They are
called Hankel determinants and were studied by Hermann Hankel (1839 â€“1873) in his thesis in
1861 . Such determinants satisfy a Ã¿ve-term recurrence relationship. This relation was used
by Oâ€™Beirne and Shanks to implement the transformation by computing separately the numerators
and the denominators of the ek (Sn )â€™s. However, numerical analysts know it is di cult to compute