ńņš. 3 |

Proof. E [X | H] is H measurable, hence G measurable, since H ā‚ G. The left hand

equality now follows by Proposition 3.5(3). To get the right hand equality, let W be the

right hand expression. It is H measurable, and if C ā H ā‚ G, then

E [W ; C] = E [E [X | G]; C] = E [X; C]

as required.

In words, if we are predicting a prediction of X given limited information, this is

the same as a single prediction given the least amount of information.

Let us verify that conditional expectation may be viewed as the best predictor of

a random variable given a Ļ-ļ¬eld. If X is a r.v., a predictor Z is just another random

variable, and the goodness of the prediction will be measured by E [(X ā’ Z)2 ], which is

known as the mean square error.

Proposition 3.8. If X is a r.v., the best predictor among the collection of G-measurable

random variables is Y = E [X | G].

Proof. Let Z be any G-measurable random variable. We compute, using Proposition

3.5(3) and Proposition 3.6,

E [(X ā’ Z)2 | G] = E [X 2 | G] ā’ 2E [XZ | G] + E [Z 2 | G]

= E [X 2 | G] ā’ 2ZE [X | G] + Z 2

= E [X 2 | G] ā’ 2ZY + Z 2

= E [X 2 | G] ā’ Y 2 + (Y ā’ Z)2

= E [X 2 | G] ā’ 2Y E [X | G] + Y 2 + (Y ā’ Z)2

= E [X 2 | G] ā’ 2E [XY | G] + E [Y 2 | G] + (Y ā’ Z)2

= E [(X ā’ Y )2 | G] + (Y ā’ Z)2 .

We also used the fact that Y is G measurable. Taking expectations and using Proposition

3.5(4),

E [(X ā’ Z)2 ] = E [(X ā’ Y )2 ] + E [(Y ā’ Z)2 ].

The right hand side is bigger than or equal to E [(X ā’ Y )2 ] because (Y ā’ Z)2 ā„ 0. So the

error in predicting X by Z is larger than the error in predicting X by Y , and will be equal

if and only if Z = Y . So Y is the best predictor.

14

There is one more interpretation of conditional expectation that may be useful. The

collection of all random variables is a linear space, and the collection of all G-measurable

random variables is clearly a subspace. Given X, the conditional expectation Y = E [X | G]

is equal to the projection of X onto the subspace of G-measurable random variables. To

see this, we write X = Y + (X ā’ Y ), and what we have to check is that the inner product

of Y and X ā’ Y is 0, that is, Y and X ā’ Y are orthogonal. In this context, the inner

product of X1 and X2 is deļ¬ned to be E [X1 X2 ], so we must show E [Y (X ā’ Y )] = 0. Note

E [Y (X ā’ Y ) | G] = Y E [X ā’ Y | G] = Y (E [X | G] ā’ Y ) = Y (Y ā’ Y ) = 0.

Taking expectations,

E [Y (X ā’ Y )] = E [E [Y (X ā’ Y ) | G] ] = 0,

just as we wished.

If Y is a discrete random variable, that is, it takes only countably many values

y1 , y2 , . . ., we let Bi = (Y = yi ). These will be disjoint sets whose union is ā„¦. If Ļ(Y )

is the collection of all unions of the Bi , then Ļ(Y ) is a Ļ-ļ¬eld, and is called the Ļ-ļ¬eld

generated by Y . It is easy to see that this is the smallest Ļ-ļ¬eld with respect to which Y

is measurable. We write E [X | Y ] for E [X | Ļ(Y )].

Note 1. We prove Proposition 3.5. (1) and (2) are immediate from the deļ¬nition. To prove

(3), note that if Z = X, then Z is G measurable and E [X; C] = E [Z; C] for any C ā G; this

is trivial. By Proposition 3.4 it follows that Z = E [X | G];this proves (3). To prove (4), if we

let C = ā„¦ and Y = E [X | G], then E Y = E [Y ; C] = E [X; C] = E X.

Last is (5). Let Z = E X. Z is constant, so clearly G measurable. By the in-

dependence, if C ā G, then E [X; C] = E [X1C ] = (E X)(E 1C ) = (E X)(P(C)). But

E [Z; C] = (E X)(P(C)) since Z is constant. By Proposition 3.4 we see Z = E [X | G].

Note 2. We prove Proposition 3.6. Note that ZE [X | G] is G measurable, so by Proposition

3.4 we need to show its expectation over sets C in G is the same as that of XZ. As in the

proof of Proposition 3.3, it suļ¬ces to consider only the case when C is one of the Bi . Now Z

is G measurable, hence it is constant on Bi ; let its value be zi . Then

E [ZE [X | G]; Bi ] = E [zi E [X | G]; Bi ] = zi E [E [X | G]; Bi ] = zi E [X; Bi ] = E [XZ; Bi ]

as desired.

15

4. Martingales.

Suppose we have a sequence of Ļ-ļ¬elds F1 ā‚ F2 ā‚ F3 Ā· Ā· Ā·. An example would be

repeatedly tossing a coin and letting Fk be the sets that can be determined by the ļ¬rst

k tosses. Another example is to let Fk be the events that are determined by the values

of a stock at times 1 through k. A third example is to let X1 , X2 , . . . be a sequence of

random variables and let Fk be the Ļ-ļ¬eld generated by X1 , . . . , Xk , the smallest Ļ-ļ¬eld

with respect to which X1 , . . . , Xk are measurable.

Deļ¬nition 4.1. A r.v. X is integrable if E |X| < ā. Given an increasing sequence of

Ļ-ļ¬elds Fn , a sequence of r.v.ā™s Xn is adapted if Xn is Fn measurable for each n.

Deļ¬nition 4.2. A martingale Mn is a sequence of random variables such that

(1) Mn is integrable for all n,

(2) Mn is adapted to Fn , and

(3) for all n

E [Mn+1 | Fn ] = Mn . (4.1)

Usually (1) and (2) are easy to check, and it is (3) that is the crucial property. If

we have (1) and (2), but instead of (3) we have

(3 ) for all n

E [Mn+1 | Fn ] ā„ Mn ,

then we say Mn is a submartingale. If we have (1) and (2), but instead of (3) we have

(3 ) for all n

E [Mn+1 | Fn ] ā¤ Mn ,

then we say Mn is a supermartingale.

Submartingales tends to increase and supermartingales tend to decrease. The

nomenclature may seem like it goes the wrong way; Doob deļ¬ned these terms by anal-

ogy with the notions of subharmonic and superharmonic functions in analysis. (Actually,

it is more than an analogy: we wonā™t explore this, but it turns out that the composition

of a subharmonic function with Brownian motion yields a submartingale, and similarly for

superharmonic functions.)

Note that the deļ¬nition of martingale depends on the collection of Ļ-ļ¬elds. When

it is needed for clarity, one can say that (Mn , Fn ) is a martingale. To deļ¬ne conditional

expectation, one needs a probability, so a martingale depends on the probability as well.

When we need to, we will say that Mn is a martingale with respect to the probability P.

This is an issue when there is more than one probability around.

We will see that martingales are ubiquitous in ļ¬nancial math. For example, security

prices and oneā™s wealth will turn out to be examples of martingales.

16

The word āmartingaleā is also used for the piece of a horseā™s bridle that runs from

the horseā™s head to its chest. It keeps the horse from raising its head too high. It turns out

that martingales in probability cannot get too large. The word also refers to a gambling

system. I did some searching on the Internet, and there seems to be no consensus on the

derivation of the term.

Here is an example of a martingale. Let X1 , X2 , . . . be a sequence of independent

r.v.ā™s with mean 0 that are independent. (Saying a r.v. Xi has mean 0 is the same as

saying E Xi = 0; this presupposes that E |X1 | is ļ¬nite.) Set Fn = Ļ(X1 , . . . , Xn ), the

n

Ļ-ļ¬eld generated by X1 , . . . , Xn . Let Mn = i=1 Xi . Deļ¬nition 4.2(2) is easy to see.

n

Since E |Mn | ā¤ i=1 E |Xi |, Deļ¬nition 4.2(1) also holds. We now check

E [Mn+1 | Fn ] = X1 + Ā· Ā· Ā· + Xn + E [Xn+1 | Fn ] = Mn + E Xn+1 = Mn ,

where we used the independence.

Another example: suppose in the above that the Xk all have variance 1, and let

n

2

Mn = Sn ā’ n, where Sn = i=1 Xi . Again (1) and (2) of Deļ¬nition 4.2 are easy to check.

We compute

2 2

E [Mn+1 | Fn ] = E [Sn + 2Xn+1 Sn + Xn+1 | Fn ] ā’ (n + 1).

2 2

We have E [Sn | Fn ] = Sn since Sn is Fn measurable.

E [2Xn+1 Sn | Fn ] = 2Sn E [Xn+1 | Fn ] = 2Sn E Xn+1 = 0.

2 2

And E [Xn+1 | Fn ] = E Xn+1 = 1. Substituting, we obtain E [Mn+1 | Fn ] = Mn , or Mn is

a martingale.

A third example: Suppose you start with a dollar and you are tossing a fair coin

independently. If it turns up heads you double your fortune, tails you go broke. This is

ādouble or nothing.ā Let Mn be your fortune at time n. To formalize this, let X1 , X2 , . . .

be independent r.v.ā™s that are equal to 2 with probability 1 and 0 with probability 1 . Then

2 2

Mn = X1 Ā· Ā· Ā· Xn . Let Fn be the Ļ-ļ¬eld generated by X1 , . . . , Xn . Note 0 ā¤ Mn ā¤ 2n , and

so Deļ¬nition 4.2(1) is satisļ¬ed, while (2) is easy. To compute the conditional expectation,

note E Xn+1 = 1. Then

E [Mn+1 | Fn ] = Mn E [Xn+1 | Fn ] = Mn E Xn+1 = Mn ,

using the independence.

Before we give our fourth example, let us observe that

|E [X | F]| ā¤ E [|X| | F]. (4.2)

To see this, we have ā’|X| ā¤ X ā¤ |X|, so ā’E [|X| | F] ā¤ E [X | F] ā¤ E [|X| | F]. Since

E [|X| | F] is nonnegative, (4.2) follows.

Our fourth example will be used many times, so we state it as a proposition.

17

Proposition 4.3. Let F1 , F2 , . . . be given and let X be a ļ¬xed r.v. with E |X| < ā. Let

Mn = E [X | Fn ]. Then Mn is a martingale.

Proof. Deļ¬nition 4.2(2) is clear, while

E |Mn | ā¤ E [E [|X| | Fn ]] = E |X| < ā

by (4.2); this shows Deļ¬nition 4.2(1). We have

E [Mn+1 | Fn ] = E [E [X | Fn+1 ] | Fn ] = E [X | Fn ] = Mn .

18

5. Properties of martingales.

When it comes to discussing American options, we will need the concept of stopping

times. A mapping Ļ„ from ā„¦ into the nonnegative integers is a stopping time if (Ļ„ = k) ā Fk

for each k.

An example is Ļ„ = min{k : Sk ā„ A}. This is a stopping time because (Ļ„ = k) =

(S1 , . . . , Skā’1 < A, Sk ā„ A) ā Fk . We can think of a stopping time as the ļ¬rst time

something happens. Ļ = max{k : Sk ā„ A}, the last time, is not a stopping time. (We will

use the convention that the minimum of an empty set is +ā; so, for example, with the

above deļ¬nition of Ļ„ , on the event that Sk is never in A, we have Ļ„ = ā.

Here is an intuitive description of a stopping time. If I tell you to drive to the city

limits and then drive until you come to the second stop light after that, you know when

you get there that you have arrived; you donā™t need to have been there before or to look

ahead. But if I tell you to drive until you come to the second stop light before the city

limits, either you must have been there before or else you have to go past where you are

supposed to stop, continue on to the city limits, and then turn around and come back two

stop lights. You donā™t know when you ļ¬rst get to the second stop light before the city

limits that you get to stop there. The ļ¬rst set of instructions forms a stopping time, the

second set does not.

Note (Ļ„ ā¤ k) = āŖk (Ļ„ = j). Since (Ļ„ = j) ā Fj ā‚ Fk , then the event (Ļ„ ā¤ k) ā Fk

j=0

for all k. Conversely, if Ļ„ is a r.v. with (Ļ„ ā¤ k) ā Fk for all k, then

(Ļ„ = k) = (Ļ„ ā¤ k) ā’ (Ļ„ ā¤ k ā’ 1).

Since (Ļ„ ā¤ k) ā Fk and (Ļ„ ā¤ k ā’ 1) ā Fkā’1 ā‚ Fk , then (Ļ„ = k) ā Fk , and such a Ļ„ must

be a stopping time.

Our ļ¬rst result is Jensenā™s inequality.

Proposition 5.1. If g is convex, then

g(E [X | G]) ā¤ E [g(X) | G]

provided all the expectations exist.

For ordinary expectations rather than conditional expectations, this is still true.

That is, if g is convex and the expectations exist, then

g(E X) ā¤ E [g(X)].

We already know some special cases of this: when g(x) = |x|, this says |E X| ā¤ E |X|;

when g(x) = x2 , this says (E X)2 ā¤ E X 2 , which we know because E X 2 ā’ (E X)2 =

E (X ā’ E X)2 ā„ 0.

19

For Proposition 5.1 as well as many of the following propositions, the statement of

the result is more important than the proof, and we relegate the proof to Note 1 below.

One reason we want Jensenā™s inequality is to show that a convex function applied

to a martingale yields a submartingale.

Proposition 5.2. If Mn is a martingale and g is convex, then g(Mn ) is a submartingale,

provided all the expectations exist.

Proof. By Jensenā™s inequality,

E [g(Mn+1 ) | Fn ] ā„ g(E [Mn+1 | Fn ]) = g(Mn ).

If Mn is a martingale, then E Mn = E [E [Mn+1 | Fn ]] = E Mn+1 . So E M0 =

E M1 = Ā· Ā· Ā· = E Mn . Doobā™s optional stopping theorem says the same thing holds when

ļ¬xed times n are replaced by stopping times.

Theorem 5.3. Suppose K is a positive integer, N is a stopping time such that N ā¤ K

a.s., and Mn is a martingale. Then

E MN = E MK .

Here, to evaluate MN , one ļ¬rst ļ¬nds N (Ļ) and then evaluates MĀ· (Ļ) for that value of N .

Proof. We have

K

E MN = E [MN ; N = k].

k=0

If we show that the k-th summand is E [Mn ; N = k], then the sum will be

K

E [Mn ; N = k] = E Mn

k=0

as desired. We have

E [MN ; N = k] = E [Mk ; N = k]

by the deļ¬nition of MN . Now (N = k) is in Fk , so by Proposition 2.2 and the fact that

Mk = E [Mk+1 | Fk ],

E [Mk ; N = k] = E [Mk+1 ; N = k].

We have (N = k) ā Fk ā‚ Fk+1 . Since Mk+1 = E [Mk+2 | Fk+1 ], Proposition 2.2 tells us

that

E [Mk+1 ; N = k] = E [Mk+2 ; N = k].

ńņš. 3 |