<< Ļšåäūäółą’ ńņš. 3(čē 19 ńņš.)ĪĆĖĄĀĖÅĶČÅ Ńėåäóžłą’ >>
E [E [X | H] | G] = E [X | H] = E [E [X | G] | H].

Proof. E [X | H] is H measurable, hence G measurable, since H ā‚ G. The left hand
equality now follows by Proposition 3.5(3). To get the right hand equality, let W be the
right hand expression. It is H measurable, and if C ā H ā‚ G, then

E [W ; C] = E [E [X | G]; C] = E [X; C]

as required.

In words, if we are predicting a prediction of X given limited information, this is
the same as a single prediction given the least amount of information.
Let us verify that conditional expectation may be viewed as the best predictor of
a random variable given a Ļ-ļ¬eld. If X is a r.v., a predictor Z is just another random
variable, and the goodness of the prediction will be measured by E [(X ā’ Z)2 ], which is
known as the mean square error.
Proposition 3.8. If X is a r.v., the best predictor among the collection of G-measurable
random variables is Y = E [X | G].

Proof. Let Z be any G-measurable random variable. We compute, using Proposition
3.5(3) and Proposition 3.6,

E [(X ā’ Z)2 | G] = E [X 2 | G] ā’ 2E [XZ | G] + E [Z 2 | G]
= E [X 2 | G] ā’ 2ZE [X | G] + Z 2
= E [X 2 | G] ā’ 2ZY + Z 2
= E [X 2 | G] ā’ Y 2 + (Y ā’ Z)2
= E [X 2 | G] ā’ 2Y E [X | G] + Y 2 + (Y ā’ Z)2
= E [X 2 | G] ā’ 2E [XY | G] + E [Y 2 | G] + (Y ā’ Z)2
= E [(X ā’ Y )2 | G] + (Y ā’ Z)2 .

We also used the fact that Y is G measurable. Taking expectations and using Proposition
3.5(4),
E [(X ā’ Z)2 ] = E [(X ā’ Y )2 ] + E [(Y ā’ Z)2 ].

The right hand side is bigger than or equal to E [(X ā’ Y )2 ] because (Y ā’ Z)2 ā„ 0. So the
error in predicting X by Z is larger than the error in predicting X by Y , and will be equal
if and only if Z = Y . So Y is the best predictor.

14
There is one more interpretation of conditional expectation that may be useful. The
collection of all random variables is a linear space, and the collection of all G-measurable
random variables is clearly a subspace. Given X, the conditional expectation Y = E [X | G]
is equal to the projection of X onto the subspace of G-measurable random variables. To
see this, we write X = Y + (X ā’ Y ), and what we have to check is that the inner product
of Y and X ā’ Y is 0, that is, Y and X ā’ Y are orthogonal. In this context, the inner
product of X1 and X2 is deļ¬ned to be E [X1 X2 ], so we must show E [Y (X ā’ Y )] = 0. Note

E [Y (X ā’ Y ) | G] = Y E [X ā’ Y | G] = Y (E [X | G] ā’ Y ) = Y (Y ā’ Y ) = 0.

Taking expectations,

E [Y (X ā’ Y )] = E [E [Y (X ā’ Y ) | G] ] = 0,

just as we wished.

If Y is a discrete random variable, that is, it takes only countably many values
y1 , y2 , . . ., we let Bi = (Y = yi ). These will be disjoint sets whose union is ā„¦. If Ļ(Y )
is the collection of all unions of the Bi , then Ļ(Y ) is a Ļ-ļ¬eld, and is called the Ļ-ļ¬eld
generated by Y . It is easy to see that this is the smallest Ļ-ļ¬eld with respect to which Y
is measurable. We write E [X | Y ] for E [X | Ļ(Y )].

Note 1. We prove Proposition 3.5. (1) and (2) are immediate from the deļ¬nition. To prove
(3), note that if Z = X, then Z is G measurable and E [X; C] = E [Z; C] for any C ā G; this
is trivial. By Proposition 3.4 it follows that Z = E [X | G];this proves (3). To prove (4), if we
let C = ā„¦ and Y = E [X | G], then E Y = E [Y ; C] = E [X; C] = E X.
Last is (5). Let Z = E X. Z is constant, so clearly G measurable. By the in-
dependence, if C ā G, then E [X; C] = E [X1C ] = (E X)(E 1C ) = (E X)(P(C)). But
E [Z; C] = (E X)(P(C)) since Z is constant. By Proposition 3.4 we see Z = E [X | G].

Note 2. We prove Proposition 3.6. Note that ZE [X | G] is G measurable, so by Proposition
3.4 we need to show its expectation over sets C in G is the same as that of XZ. As in the
proof of Proposition 3.3, it suļ¬ces to consider only the case when C is one of the Bi . Now Z
is G measurable, hence it is constant on Bi ; let its value be zi . Then

E [ZE [X | G]; Bi ] = E [zi E [X | G]; Bi ] = zi E [E [X | G]; Bi ] = zi E [X; Bi ] = E [XZ; Bi ]

as desired.

15
4. Martingales.
Suppose we have a sequence of Ļ-ļ¬elds F1 ā‚ F2 ā‚ F3 Ā· Ā· Ā·. An example would be
repeatedly tossing a coin and letting Fk be the sets that can be determined by the ļ¬rst
k tosses. Another example is to let Fk be the events that are determined by the values
of a stock at times 1 through k. A third example is to let X1 , X2 , . . . be a sequence of
random variables and let Fk be the Ļ-ļ¬eld generated by X1 , . . . , Xk , the smallest Ļ-ļ¬eld
with respect to which X1 , . . . , Xk are measurable.

Deļ¬nition 4.1. A r.v. X is integrable if E |X| < ā. Given an increasing sequence of
Ļ-ļ¬elds Fn , a sequence of r.v.ā™s Xn is adapted if Xn is Fn measurable for each n.

Deļ¬nition 4.2. A martingale Mn is a sequence of random variables such that

(1) Mn is integrable for all n,
(2) Mn is adapted to Fn , and
(3) for all n
E [Mn+1 | Fn ] = Mn . (4.1)

Usually (1) and (2) are easy to check, and it is (3) that is the crucial property. If
we have (1) and (2), but instead of (3) we have
(3 ) for all n
E [Mn+1 | Fn ] ā„ Mn ,

then we say Mn is a submartingale. If we have (1) and (2), but instead of (3) we have
(3 ) for all n
E [Mn+1 | Fn ] ā¤ Mn ,

then we say Mn is a supermartingale.
Submartingales tends to increase and supermartingales tend to decrease. The
nomenclature may seem like it goes the wrong way; Doob deļ¬ned these terms by anal-
ogy with the notions of subharmonic and superharmonic functions in analysis. (Actually,
it is more than an analogy: we wonā™t explore this, but it turns out that the composition
of a subharmonic function with Brownian motion yields a submartingale, and similarly for
superharmonic functions.)
Note that the deļ¬nition of martingale depends on the collection of Ļ-ļ¬elds. When
it is needed for clarity, one can say that (Mn , Fn ) is a martingale. To deļ¬ne conditional
expectation, one needs a probability, so a martingale depends on the probability as well.
When we need to, we will say that Mn is a martingale with respect to the probability P.
This is an issue when there is more than one probability around.
We will see that martingales are ubiquitous in ļ¬nancial math. For example, security
prices and oneā™s wealth will turn out to be examples of martingales.

16
The word āmartingaleā is also used for the piece of a horseā™s bridle that runs from
the horseā™s head to its chest. It keeps the horse from raising its head too high. It turns out
that martingales in probability cannot get too large. The word also refers to a gambling
system. I did some searching on the Internet, and there seems to be no consensus on the
derivation of the term.
Here is an example of a martingale. Let X1 , X2 , . . . be a sequence of independent
r.v.ā™s with mean 0 that are independent. (Saying a r.v. Xi has mean 0 is the same as
saying E Xi = 0; this presupposes that E |X1 | is ļ¬nite.) Set Fn = Ļ(X1 , . . . , Xn ), the
n
Ļ-ļ¬eld generated by X1 , . . . , Xn . Let Mn = i=1 Xi . Deļ¬nition 4.2(2) is easy to see.
n
Since E |Mn | ā¤ i=1 E |Xi |, Deļ¬nition 4.2(1) also holds. We now check
E [Mn+1 | Fn ] = X1 + Ā· Ā· Ā· + Xn + E [Xn+1 | Fn ] = Mn + E Xn+1 = Mn ,
where we used the independence.
Another example: suppose in the above that the Xk all have variance 1, and let
n
2
Mn = Sn ā’ n, where Sn = i=1 Xi . Again (1) and (2) of Deļ¬nition 4.2 are easy to check.
We compute
2 2
E [Mn+1 | Fn ] = E [Sn + 2Xn+1 Sn + Xn+1 | Fn ] ā’ (n + 1).
2 2
We have E [Sn | Fn ] = Sn since Sn is Fn measurable.
E [2Xn+1 Sn | Fn ] = 2Sn E [Xn+1 | Fn ] = 2Sn E Xn+1 = 0.
2 2
And E [Xn+1 | Fn ] = E Xn+1 = 1. Substituting, we obtain E [Mn+1 | Fn ] = Mn , or Mn is
a martingale.
A third example: Suppose you start with a dollar and you are tossing a fair coin
independently. If it turns up heads you double your fortune, tails you go broke. This is
ādouble or nothing.ā Let Mn be your fortune at time n. To formalize this, let X1 , X2 , . . .
be independent r.v.ā™s that are equal to 2 with probability 1 and 0 with probability 1 . Then
2 2
Mn = X1 Ā· Ā· Ā· Xn . Let Fn be the Ļ-ļ¬eld generated by X1 , . . . , Xn . Note 0 ā¤ Mn ā¤ 2n , and
so Deļ¬nition 4.2(1) is satisļ¬ed, while (2) is easy. To compute the conditional expectation,
note E Xn+1 = 1. Then
E [Mn+1 | Fn ] = Mn E [Xn+1 | Fn ] = Mn E Xn+1 = Mn ,
using the independence.
Before we give our fourth example, let us observe that
|E [X | F]| ā¤ E [|X| | F]. (4.2)
To see this, we have ā’|X| ā¤ X ā¤ |X|, so ā’E [|X| | F] ā¤ E [X | F] ā¤ E [|X| | F]. Since
E [|X| | F] is nonnegative, (4.2) follows.
Our fourth example will be used many times, so we state it as a proposition.

17
Proposition 4.3. Let F1 , F2 , . . . be given and let X be a ļ¬xed r.v. with E |X| < ā. Let
Mn = E [X | Fn ]. Then Mn is a martingale.

Proof. Deļ¬nition 4.2(2) is clear, while

E |Mn | ā¤ E [E [|X| | Fn ]] = E |X| < ā

by (4.2); this shows Deļ¬nition 4.2(1). We have

E [Mn+1 | Fn ] = E [E [X | Fn+1 ] | Fn ] = E [X | Fn ] = Mn .

18
5. Properties of martingales.
When it comes to discussing American options, we will need the concept of stopping
times. A mapping Ļ„ from ā„¦ into the nonnegative integers is a stopping time if (Ļ„ = k) ā Fk
for each k.
An example is Ļ„ = min{k : Sk ā„ A}. This is a stopping time because (Ļ„ = k) =
(S1 , . . . , Skā’1 < A, Sk ā„ A) ā Fk . We can think of a stopping time as the ļ¬rst time
something happens. Ļ = max{k : Sk ā„ A}, the last time, is not a stopping time. (We will
use the convention that the minimum of an empty set is +ā; so, for example, with the
above deļ¬nition of Ļ„ , on the event that Sk is never in A, we have Ļ„ = ā.
Here is an intuitive description of a stopping time. If I tell you to drive to the city
limits and then drive until you come to the second stop light after that, you know when
you get there that you have arrived; you donā™t need to have been there before or to look
ahead. But if I tell you to drive until you come to the second stop light before the city
limits, either you must have been there before or else you have to go past where you are
supposed to stop, continue on to the city limits, and then turn around and come back two
stop lights. You donā™t know when you ļ¬rst get to the second stop light before the city
limits that you get to stop there. The ļ¬rst set of instructions forms a stopping time, the
second set does not.
Note (Ļ„ ā¤ k) = āŖk (Ļ„ = j). Since (Ļ„ = j) ā Fj ā‚ Fk , then the event (Ļ„ ā¤ k) ā Fk
j=0
for all k. Conversely, if Ļ„ is a r.v. with (Ļ„ ā¤ k) ā Fk for all k, then

(Ļ„ = k) = (Ļ„ ā¤ k) ā’ (Ļ„ ā¤ k ā’ 1).

Since (Ļ„ ā¤ k) ā Fk and (Ļ„ ā¤ k ā’ 1) ā Fkā’1 ā‚ Fk , then (Ļ„ = k) ā Fk , and such a Ļ„ must
be a stopping time.
Our ļ¬rst result is Jensenā™s inequality.
Proposition 5.1. If g is convex, then

g(E [X | G]) ā¤ E [g(X) | G]

provided all the expectations exist.
For ordinary expectations rather than conditional expectations, this is still true.
That is, if g is convex and the expectations exist, then

g(E X) ā¤ E [g(X)].

We already know some special cases of this: when g(x) = |x|, this says |E X| ā¤ E |X|;
when g(x) = x2 , this says (E X)2 ā¤ E X 2 , which we know because E X 2 ā’ (E X)2 =
E (X ā’ E X)2 ā„ 0.

19
For Proposition 5.1 as well as many of the following propositions, the statement of
the result is more important than the proof, and we relegate the proof to Note 1 below.
One reason we want Jensenā™s inequality is to show that a convex function applied
to a martingale yields a submartingale.
Proposition 5.2. If Mn is a martingale and g is convex, then g(Mn ) is a submartingale,
provided all the expectations exist.

Proof. By Jensenā™s inequality,

E [g(Mn+1 ) | Fn ] ā„ g(E [Mn+1 | Fn ]) = g(Mn ).

If Mn is a martingale, then E Mn = E [E [Mn+1 | Fn ]] = E Mn+1 . So E M0 =
E M1 = Ā· Ā· Ā· = E Mn . Doobā™s optional stopping theorem says the same thing holds when
ļ¬xed times n are replaced by stopping times.
Theorem 5.3. Suppose K is a positive integer, N is a stopping time such that N ā¤ K
a.s., and Mn is a martingale. Then

E MN = E MK .

Here, to evaluate MN , one ļ¬rst ļ¬nds N (Ļ) and then evaluates MĀ· (Ļ) for that value of N .
Proof. We have
K
E MN = E [MN ; N = k].
k=0

If we show that the k-th summand is E [Mn ; N = k], then the sum will be
K
E [Mn ; N = k] = E Mn
k=0

as desired. We have
E [MN ; N = k] = E [Mk ; N = k]

by the deļ¬nition of MN . Now (N = k) is in Fk , so by Proposition 2.2 and the fact that
Mk = E [Mk+1 | Fk ],
E [Mk ; N = k] = E [Mk+1 ; N = k].

We have (N = k) ā Fk ā‚ Fk+1 . Since Mk+1 = E [Mk+2 | Fk+1 ], Proposition 2.2 tells us
that
E [Mk+1 ; N = k] = E [Mk+2 ; N = k].

 << Ļšåäūäółą’ ńņš. 3(čē 19 ńņš.)ĪĆĖĄĀĖÅĶČÅ Ńėåäóžłą’ >>