tragedy, all seven happen to die in an avalanche caused by a submerged

volcano. What is the effectiveness of your treatment on cancer mortality? Just

looking at the data, it is tempting to say there is a 7 percent mortality rate.

However, this mortality is clearly not related to the treatment, so the answer

does not feel right.

And, in fact, the answer is not right. This is an example of competing risks. A

study participant might live, might die of cancer, or might die of a mountain

climbing accident on a distant island. Or the patient might move to Tahiti and

drop out of the study. As medical researchers say, such a patient has been “lost

to follow-up.”

The solution is to censor the patients who exit the study before the event

being studied occurs. If patients drop out of the study, then they were healthy

to the point in time when they dropped out, and the information acquired dur

ing this period can be used to calculate hazards. Afterward there is no way of

knowing what happened. They are censored at the point when they exit. If a

patient dies of something else, then he or she is censored at the point when

death occurs, and the death is not included in the hazard calculation.

T I P The right way to deal with competing risks is to develop different sets of

hazards for each risk, where the other risks are censored.

Competing risks are familiar in the business environment as well. For

instance, there are often two types of stops: voluntary stops, when a customer

decides to leave, and involuntary stops, when the company decides a cus

tomer should leave”often due to unpaid bills

In doing an analysis on voluntary churn, what happens to customers who

are forced to discontinue their relationships due to unpaid bills? If such a

customer were forced to stop on day 100, then that customer did not stop vol

untarily on days 1“99. This information can be used to generate hazards for

voluntary stops. However, starting on day 100, the customer is censored, as

shown in Figure 12.8. Censoring customers, even when they have stopped for

other reasons, makes it possible to understand different types of stops.

404 Chapter 12

These two customers were forced to

leave, so they are censored at the

point of attrition instead of being

considered stopped.

All the data from before they left is

included in the calculation of the

hazard functions for voluntary

attrition ” since this they remained

as customers before then.

time

Figure 12.8 Using censoring makes it possible to develop hazard models for voluntary

attrition that include customers who were forced to leave.

From Hazards to Survival

This chapter started with a discussion of retention curves. From the hazard

functions, it is possible to create a very similar curve, called the survival curve.

The survival curve is more useful and in many senses more accurate.

Retention

A retention curve provides information about how many customers have been

retained for a certain amount of time. One common way of creating a retention

curve is to do the following:

For customers who started 1 week ago, measure the 1-week retention.

––

For customers who started 2 weeks ago, measure the 2-week retention.

––

And so on.

––

Figure 12.9 shows an example of a retention curve based on this approach.

The overall shape of this curve looks appropriate. However, the curve itself is

quite jagged. It seems odd, for instance, that 10-week retention would be bet

ter than 9-week retention, as suggested by this data.

Hazard Functions and Survival Analysis in Marketing 405

100%

90%

80%

70%

60%

Retention

50%

40%

30%

20%

10%

0%

0 10 20 30 40 50 60 70 80 90 100 110

Tenure (Weeks)

Figure 12.9 A retention curve might be quite jagged.

Actually, it is more than odd, it violates the very idea notion of retention. For

instance, it opens the possibility that the curve will cross the 50 percent thresh

old more than once, leading to the odd, and inaccurate, conclusion that there

is more than one median lifetime, or that the average retention for customers

during the first 10 weeks after they start might be more than the average for

the first 9 weeks. What is happening? Are customers being reincarnated?

These problems are an artifact of the way the curve was created. Cus

tomers acquired in any given time period may be better or worse than the

customers acquired in other time periods. For instance, perhaps 9 weeks ago

there was a special pricing offer that brought in bad customers. Customers

who started 10 weeks ago were the usual mix of good and bad, but those who

started 9 weeks ago were particularly bad. So, there are fewer of the bad cus

tomers after 9 weeks than of the better customers after 10 weeks.

The quality of customers might also vary due merely to random varia

tion. After all, in the previous figure, there are over 100 time periods being

considered”so, all things being equal, some time periods would be expected

to exhibit differences.

A compounding reason is that marketing efforts change over time, attract

ing different qualities of customers. For instance, customers arriving by differ

ent channels often have different retention characteristics, and the mix of

customers from different channels is likely to change over time.

Survival

Hazards give the probability that a customer might stop at a particular point in

time. Survival, on the other hand, gives the probability of a customer surviving

up to that time. Survival values are calculated directly from the hazards.

406 Chapter 12

At any point in time, the chance that a customer survives to the next unit of

time is simply 1 “ hazard, which is called conditional survival at time t (it is

conditional because it assumes that the customers survived up to time t).

Calculating the full survival at a given time requires accumulating all the con

ditional survivals up to that point in time by multiplying them together. The

survival value starts at 1 (or 100 percent) at time 0, since all customers included

in analysis survive to the beginning of the analysis.

Since the hazard is always between 0 and 1, the conditional survival is also

between 0 and 1. Hence, survival itself is always getting smaller”because

each successive value is being multiplied by a number less than 1. The survival

curve itself starts at 1, gently goes down, sometimes flattening, perhaps, out

but never rising up.

Survival curves make more sense for customer retention purposes than the

retention curves described earlier. Figure 12.10 shows a survival curve and its

corresponding retention curve. It is clear that the survival curve is smoother,

and that it slopes downward at all times. The retention curve bounces all over

the place.

The differences between the retention curve and the survival curve may, at

first, seem nonintuitive. The retention curve is actually pasting together a

whole bunch of different pictures of customers from the past, like a photo col

lage pieced together from a bunch of different photographs to get a panoramic

image. In the collage, the picture in each photo is quite clear. However, the

boundaries do not necessarily fit together smoothly. Different pictures in the

collage look different, because of differences in lighting or perspective”

differences that contribute to the aesthetic of the collage.

100%

90%

80%

70%

Retention/Survival

60%

50%

40%

30%

20%

10%

0%

0 10 20 30 40 50 60 70 80 90 100 110

Tenure (Weeks)

Figure 12.10 A survival curve is smoother than a retention curve.

Hazard Functions and Survival Analysis in Marketing 407

The same thing is happening with retention curves, where customers who

start at different points in time have different perspectives. Any given point on

the retention curve is close to the actual retention value; however, taken as a

whole, it looks jagged. One way to remove the jaggedness is to focus on cus

tomers who start at about the same time, as suggested earlier in this chapter.

However, this greatly reduces the amount of data contributing to the curve.

T I P Instead of using retention curves, use survival curves. That is, first

calculate the hazards and then work back to calculate the survival curve.

The survival curve, on the other hand, looks at as many customers as possi

ble, not just the ones who started exactly n time periods ago. The survival at

any given point in time t uses information from all customers. The hazard at

time t uses information from all customers whose tenure is greater than or

equal to that value (assuming all are in the population at risk). Survival,

though, is calculated by combining all the information for hazards from

smaller values of t.

Because survival calculations use all the data, the values are more stable

than retention calculations. Each point on a retention curve limits customers to

having started at a particular point in time. Also, because a survival curve

always slopes downward, calculations of customer half-life and average cus

tomer tenure are more accurate. By incorporating more information, survival

provides a more accurate, smoother picture of customer retention.

When analyzing customers, both hazards and survival provide valuable

information about customers. Because survival is cumulative, it gives a good

summary value for comparing different groups of customers: How does the

1-year survival compare among different groups? Survival is also used for

calculating customer half-life and mean customer tenure, which in turn feed

into other calculations, such as customer value.

Because survival is cumulative, it is difficult to see patterns at a particular

point in time. Hazards make the specific causes much more apparent. When

discussing some real-world hazards, it was possible to identify events during

the customer life cycle that were drivers of hazards. Survival curves do not

highlight such events as clearly as hazards do.

The question may also arise about comparing hazards for different groups

of customers. It does not make sense to compare average hazards over a

period of time. Mathematically, “average hazard” does not make sense. The

right approach is to turn the hazards into survival and compare the values on

the survival curves.

The description of hazards and survival presented so far differs a bit from

how the subject is treated in statistics. The sidebar “A Note about Survival

Analysis and Statistics” explains the differences further.

408 Chapter 12

A NOTE ABOUT SURVIVAL ANALYSIS AND STATISTICS

The discussion of survival analysis in this chapter assumes that time is discrete.

In particular, things happen on particular days, and the particular time of day is

not important. This is not only reasonable for the problems addressed by data

mining, but it is also more intuitive and simplifies the mathematics.

In statistics, though, survival analysis makes the opposite assumption, that

time is continuous. Instead of hazard probabilities, statisticians work with

hazard rates, which are turned into survival curves by using exponentiation and

integration. One difference between a rate and a probability is that the rate can

exceed 1, whereas a probability never does. Also, a rate seems less intuitive for

many survival problems encountered with customers.

The method for calculating hazards in this chapter is called the life table

method, and it works well with discrete time data. A very similar method, called

Kaplan-Meier, is used for continuous time data. The two techniques produce

almost exactly the same results when events occur at discrete times.

An important part of statistical survival analysis is the estimation of hazards

using parameterized regression”trying to find the best functional form for the

hazards. This is an alternative approach, calculating the hazards directly from

the data.

The parameterized approach has the important advantage that it can more

easily include covariates in the process. Later in this chapter, there is an

example based on such a parameterized model. Unfortunately, the hazard

function rarely follows a form that would be familiar to nonstatisticians. The

hazards do such a good job of describing the customer life cycle that it would

be shocking if a simple function captured that rich complexity.

We strongly encourage interested readers who have a mathematical or

statistical background to investigate the area further.

Proportional Hazards

Sir David Cox is one of the most cited statisticians of the past century; his work

comprises numerous books and over 250 articles. He has received many

awards including a knighthood bestowed on him by Queen Elizabeth in 1985.

Much of his research centered on understanding hazard functions, and his

work has been particularly important in the world of medical research.

His seminal paper was about determining the effect of initial factors (time

zero covariates) on hazards. By assuming that these initial factors have a uni