. 10
( 18 .)


tn ¤t¤tn+1 tn

Now choose β so large that
4κ 1
¤; (10.20)
β 2
i.e., let β ≥ 8κ. Then (10.19) implies that

|v(t)| ¤ 2|v(tn )| + 4|b x(tn ) |
tn ¤t¤tn+1

|w(t) ’ w(tn )|
+ 4κ sup
tn ¤t¤tn+1
e’β(t’s) dw(s)|.

+2 sup
tn ¤t¤tn+1 tn

·n = sup , (10.22)
tn ¤t¤tn+1

b x(tn )
ζn = , (10.23)

e’β(t’s) dw(s)
µn =2 sup
tn ¤t¤tn+1 tn

sup |w(t) ’ w(tn )|.
β tn ¤t¤tn+1

Recall that our task is to show that ·n ’ 0 with probability one for all n
as β ’ ∞. Suppose we can show that

µn ’ 0 (10.25)

with probability one for all n as β ’ ∞. By (10.21),

·n ¤ 2·n’1 + 4ζn + µn (10.26)

where ·’1 = |v0 |/β, and by (10.18) for n ’ 1 and (10.20),

1 1
ζn ¤ 2ζn’1 + ·n’1 + µn . (10.27)
2 2
Now ζ0 = |b(x0 )|/β ’ 0 and ·’1 = |v0 |/β ’ 0, ζ1 ’ 0 by (10.27) and
(10.25), and consequently ·1 ’ 0. By induction, ζn ’ 0 and ·n ’ 0 for
all n. Therefore, we need only prove (10.25).
It is clear that the second term on the right hand side of (10.24)
converges to 0 with probability one as β ’ ∞, since w is continuous with
probability one. Let

w(t) ’ w(tn ), t ≥ tn ,
z(t) =
0, t < tn .

t t
e’β(t’s) dz(s)
e dw(s) =
e’β(t’s) z(s) ds + z(t).
= ’β

This converges to 0 uniformly for tn ¤ t ¤ tn+1 with probability one, since
z is continuous with probability one. Therefore (10.25) holds. QED.

A possible physical objection to the theorem is that the initial velocity
v0 should not be held ¬xed as β varies but should have a Maxwellian
distribution (Gaussian with mean 0 and variance Dβ). Let v00 have a
Maxwellian distribution for a ¬xed value β = β0 . Then v0 = (β/β0 ) 2 v00
has a Maxwellian distribution for all β. Since it is still true that v0 /β ’ 0
as β ’ ∞, the theorem remains true with a Maxwellian initial velocity.
Theorem 10.1 has a corollary that can be expressed purely in the lan-
guage of partial equations:

PSEUDOTHEOREM 10.2 Let b : ‚ ’ ‚ satisfy a global Lipschitz con-
dition, and let D and β be strictly positive constants. Let f0 be a bounded

‚. Let f on [0, ∞) — ‚ be the bounded solution
continuous function on

f (t, x) = D∆x + b(x) · f (t, x); f (0, x) = f0 (x). (10.28)
Let gβ on [0, ∞) — ‚ — ‚ be the bounded solution of

gβ (t, x, v) = β 2 D∆v + v · + β(b(x) ’ v) · gβ (t, x, v);
x v
gβ (0, x, v) = f0 (x). (10.29)

Then for all t, x, and v,

lim gβ (t, x, v) = f (t, x). (10.30)

To prove this, notice that f (t, x0 ) = Ef0 y(t) and gβ (t, x0 , v0 ) =
Ef0 x(t) , since (10.28) and (10.29) are the backward Kolmogorov equa-
tions of the two processes. The result follows from Theorem 10.1 and the
Lebesgue dominated convergence theorem.
There is nothing wrong with this proof”only the formulation of the
result is at fault. Equation (10.28) is a parabolic equation with smooth
coe¬cients, and it is a classical result that it has a unique bounded so-
lution. However, (10.29) is not parabolic (it is of ¬rst order in x), so we
do not know that it has a unique bounded solution. One way around this
problem would be to let gβ,µ be the unique bounded solution of (10.29)
with the additional operator µ∆x on the right hand side and to prove that
gβ,µ (t, x0 , v0 ) ’ gβ (t, x0 , v0 ) = Ef0 x(t) as µ ’ 0. This would give us
a characterization of gβ purely in terms of partial di¬erential equations.
We shall not do this.


[26]. Eugen Kappler, Versuche zur Messung der Avogadro-Loschmidt-
schen Zahl aus der Brownschen Bewegung einer Drehwaage, Annalen der
Physik, 11 (1931), 233“256.
Chapter 11

Kinematics of stochastic

We shall investigate the kinematics of motion in which chance plays a
rˆle (stochastic motion).
Let x(t) be the position of a particle at time t. What does it mean
to say that the particle has a velocity x(t)? It means that if ∆t is a very

short time interval then

x(t + ∆t) ’ x(t) = x(t)∆t + µ,

where µ is a very small percentage error. This is an assumption about
actual motion of particles that may not be true. Let us be conservative
and suppose that it is not necessarily true. (“Conservative” is a useful
word for mathematicians. It is used when introducing a hypothesis that
a physicist would regard as highly implausible.)
The particle should have some tendency to persist in uniform rectilin-
ear motion for very small intervals of time. Let us use Dx(t) to denote
the best prediction we can make, given any relevant information available
at time t, of
x(t + ∆t) ’ x(t)
for in¬nitely small positive ∆t.
Let us make this notion precise.
Let I be an interval that is open on the right, let x be an ‚ -valued
stochastic process indexed by I, and let Pt for t in I be an increasing
family of σ- algebras such that each x(t) is Pt -measurable. (This implies


that Pt contains the σ-algebra generated by the x(s) with s ¤ t, s ∈ I.
Conversely, this family of σ-algebras satis¬es the hypotheses.) We shall
have occasion to introduce various regularity assumptions, denoted by
(R0), (R1), etc.

(R0). Each x(t) is in L 1 and t ’ x(t) is continuous from I into L 1 .

This is a very weak assumption and by no means implies that the
sample functions (trajectories) of the x process are continuous.

(R1). The condition (R0) holds and for each t in I,

x(t + ∆t) ’ x(t)
Dx(t) = lim E

exists as a limit in L 1 , and t ’ Dx(t) is continuous from I into L 1 .

Here E{ |Pt } denotes the conditional expectation; cf. Doob [15, §6].
The notation ∆t ’ 0+ means that ∆t tends to 0 through positive values.
The random variable Dx(t) is automatically Pt -measurable. It is called
the mean forward derivative (or mean forward velocity if x(t) represents
As an example of an (R1) process, let I = (’∞, ∞), let x(t) be the
position in the Ornstein-Uhlenbeck process, and let Pt be the σ-algebra
generated by the x(s) with s ¤ t. Then Dx(t) = dx(t)/dt = v(t). In
fact, if t ’ x(t) has a continuous strong derivative dx(t)/dt in L 1 , then
Dx(t) = dx(t)/dt. A second example of an (R1) process is a process x(t)
of the form discussed in Theorem 8.1, with I = [0, ∞), x(0) = x0 , and
Pt the σ-algebra generated by the x(s) with 0 ¤ s ¤ t. In this case
Dx(t) = b x(t) . The derivative dx(t)/dt does not exist in this exam-
ple unless w is identically 0. For a third example, let P t be a Markovian
semigroup on a locally compact Hausdor¬ space X with in¬nitesimal gen-
erator A, let I = [0, ∞), let ξ(t) be the X-valued random variables of the
Markov process for some initial measure, and let Pt be the σ-algebra
generated by the ξ(s) with 0 ¤ s ¤ t. If f is in the domain of the
in¬nitesimal generator A then x(t) = f ξ(t) is an (R1) process, and
Df ξ(t) = Af ξ(t) .

THEOREM 11.1 Let x be an (R1) process, and let a ¤ b, a ∈ I, b ∈ I.
E{x(b) ’ x(a) Pa } = E Dx(s) ds Pa (11.1)

Notice that since s ’ Dx(s) is continuous in L 1 , the integral exists
as a Riemann integral in L 1 .

Proof. Let µ > 0 and let J be the set of all t in [a, b] such that
E{x(s) ’ x(a) | Pa } ’ E Dx(r) dr Pa ¤ µ(s ’ a) (11.2)
a 1

1 denotes the L norm. Clearly, a is in J,
for all a ¤ s ¤ t, where
and J is a closed subinterval of [a, b]. Let t be the right end-point of J,
and suppose that t < b. By the de¬nition of Dx(t), there is a δ > 0 such
that t + δ ¤ b and
E{x(t + ∆t) ’ x(t) | Pt } ’ Dx(t)∆t ¤ ∆t
for 0 ¤ ∆t ¤ δ. Since conditional expectations reduce the L 1 norm and
since Pt © Pa = Pa ,
E{x(t + ∆t) ’ x(t) | Pa } ’ E{Dx(t)∆t | Pa } ¤ ∆t (11.3)
for 0 ¤ ∆t ¤ δ. By reducing δ if necessary, we ¬nd
Dx(t)∆t ’ ¤ ∆t
Dx(s) ds
t 1

for 0 ¤ ∆t ¤ δ, since s ’ Dx(s) is L 1 continuous. Therefore,
E{Dx(t)∆t Pa } ’ E Dx(s) ds Pa ¤ ∆t (11.4)
t 1

for 0 ¤ ∆t ¤ δ. From (11.2) for s = t, (11.3), and (11.4), it follows
that (11.2) folds for all t + ∆t with 0 ¤ ∆t ¤ δ. This contradicts the
assumption that t is the end-point of J, so we must have t = b. Since µ


. 10
( 18 .)